> I have noticed that mail messages seem to get unusually high scores > from the indexer, while holmes makes the problem much less of a issue > (since it separates the conversation results) it still seems like > something worth fixing. I can't seem to figure out exactly why the > scoring is so off, but an initial guess would be the ease with which > we can add hotwords for email (subject lines) as opposed to most other > backends.
(from http://wiki.apache.org/jakarta-lucene/LuceneFAQ ) Lucene automatically adds a weight inversely proportional to the length of the field i.e. terms in short fields (like sender name, email address, subject) will get a higher weight (known as 'boost') that terms in text. Same holds for document metadata - they have more weight than document data/text. (from my understanding) Beagle searches several lucene indexes and merges the results based on their scores. Somewhere during the process, it recalculates the score based on the age of the document. However, absolute value of lucene scores are not directly comparable - the ratio (and hence the ranking) between the scores are comparable. In that sense, I dont think scores across multiple indexes should be directly compared. Ranking in a particular backend is meaningful and IMO, that is correct way to do it. - dBera _______________________________________________ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers