Shouldn't your analyzer also convert "Rochelle Rochelle" to "Rochelle" ?
paul Le 25-déc.-08 à 14:20, Israel Tsadok a écrit :
A recurring problem I have with Lucene results is when a document contains the same word over and over again. If for some reason I have a document containing "badger badger badger badger badger badger badger badger", itwould appear high on the search results for "badger", even though it's usually a useless document.What I would like to do is ignore repeating words when counting the termfrequency. At first, I thought I could achieve this by indexing with aTokenFilter that would skip repeated tokens, but then a search for e.g."Rochelle Rochelle" would return no results.What I would like is to index all 8 "badger"s, but have the frequency of"badger" saved as 1. Is that even possible? Digging around in Lucene code, I found term frequency calculationsin FreqProxTermsWriterPerField.addTerm() - is that where I need to look?Any help would be appreciated. Israel
smime.p7s
Description: S/MIME cryptographic signature