Hi Nate, The scores are only comparable within the same search and not over different searches as the scores are affected by query as well as docs. About the threshold, I guess you could have count cutoff to get 'x' best matches. Said so coz I'm not really able to recollect anything which could use score as a metric to absolutely cluster 'good' and 'not good' matches.
-- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw............ On Thu, May 7, 2009 at 6:27 AM, Nate <n...@n4te.com> wrote: > Hi all, > > First, the problem I'm trying to solve: I have two folders, each > containing files. I need to match files in one folder with files in > the other. Eg: > > notes/Michael Jackson - Don't Stop 'till You Get Enough.notes > songs/Michael Jackson Don't stop until you get enough.mp3 > > I provide the notes files, but the song files come from a user's music > library, so often are not named well. I am attempting to use Lucene to > find the most likely note file for each song file. > > I index the note files, then I use the StandardAnalyzer with carefully > chosen stop words to search the index. The query uses each word in the > song file name (w/o extension) as a term. Fuzzy matching is used for > words with > 4 characters, and the fuzzy percentage is set to be 1 / > termlength. This works ok so far, though I would love to hear opinions > on any improvements I could make. This is my first use of Lucene, so > I'm not sure I've chosen the best approach. > > The problem I'm having is: Sometimes there is a song file that has no > matching note file. In this case I get back results with "low" scores, > such as 0.2 or 0.05. A "really good" match gives me 7 or 8. I don't > really understand what the scoring means, so I don't know what would > be a reasonable threshold to ignore scores. > > I understand scores are not relevance percentages. I think the scores > are only useful relative to other scores. Is this right? Are they only > relative to scores from the same search, or from any search against > the same index? How can I know if a score is "low", so I can ignore > matches that aren't very good? > > Sorry if this has been discussed before. I have searched around a > great deal and was unable to find a straight answer. > > Thanks! > -Nate > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >