Re: FuzzyLikeThisQuery what does maxNumTerms mean

markharw00d Wed, 09 May 2007 14:10:14 -0700

The shortlisting isn't based on stop words - a score is produced toprioritise term selections. The score uses the IDF (inverse documentfrequency) of the original term and mixes in the "edit-distance" foreach of the fuzzy variations of original terms. Care is taken to ensurethat in the query produced, fuzzy variants all use the root term's IDF(or if the root term is not in the index the average IDF of all of thevariants is used by each variant). This avoids the rarer variantsranking more highly than the source term in query results.


Mark


bhecht wrote:

Thanks Mark for the detailed explanation.
So one more question if I may:
How is the list shortened to to include <maxNumTerms> terms only?
In your example you had 2 stop words which of course are not included in the
token stream.
But what happens if you get more than maxNumTerms terms, how are the
maxNumTerms selected from the list?
Thanks.




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: FuzzyLikeThisQuery what does maxNumTerms mean

Reply via email to