Hi Nate,
The scores are only comparable within the same search and not over different
searches as the scores are affected by query as well as docs.
About the threshold, I guess you could have count cutoff to get 'x' best
matches. Said so coz I'm not really able to recollect anything which could
use score as a metric to absolutely cluster 'good' and 'not good' matches.

--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............


On Thu, May 7, 2009 at 6:27 AM, Nate <n...@n4te.com> wrote:

> Hi all,
>
> First, the problem I'm trying to solve: I have two folders, each
> containing files. I need to match files in one folder with files in
> the other. Eg:
>
> notes/Michael Jackson - Don't Stop 'till You Get Enough.notes
> songs/Michael Jackson Don't stop until you get enough.mp3
>
> I provide the notes files, but the song files come from a user's music
> library, so often are not named well. I am attempting to use Lucene to
> find the most likely note file for each song file.
>
> I index the note files, then I use the StandardAnalyzer with carefully
> chosen stop words to search the index. The query uses each word in the
> song file name (w/o extension) as a term. Fuzzy matching is used for
> words with > 4 characters, and the fuzzy percentage is set to be 1 /
> termlength. This works ok so far, though I would love to hear opinions
> on any improvements I could make. This is my first use of Lucene, so
> I'm not sure I've chosen the best approach.
>
> The problem I'm having is: Sometimes there is a song file that has no
> matching note file. In this case I get back results with "low" scores,
> such as 0.2 or 0.05. A "really good" match gives me 7 or 8. I don't
> really understand what the scoring means, so I don't know what would
> be a reasonable threshold to ignore scores.
>
> I understand scores are not relevance percentages. I think the scores
> are only useful relative to other scores. Is this right? Are they only
> relative to scores from the same search, or from any search against
> the same index? How can I know if a score is "low", so I can ignore
> matches that aren't very good?
>
> Sorry if this has been discussed before. I have searched around a
> great deal and was unable to find a straight answer.
>
> Thanks!
> -Nate
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to