David Spencer wrote:

[c] "interesting words" - uses code from MoreLikeThis to give a table of all interesting
words in the current "source" doc ordered by score.
Remember score is idf*tf as per Dougs mail (and as per my
hopefully correct understanding of these things). This page is of course more of a debugging
tool that something a normal user would see. One possible area of improvement that jumped out at me after reviewing this table is using stemming, say, allowing more words in the generated query when 2 words have the same stem.

Actually, the analyzer should do that, shouldn't it? For example, I have stemming analyzers for a variety of languages that both stem and remove stop words - it seems silly to me to duplicate that functionality when it's so easily provided by the analyzer. Given that, I would suggest removing the stop word functionality from this class as it is not needed and only confuses things.


Regards,

Bruce Ritchie
http://www.jivesoftware.com/

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature



Reply via email to