Despite making IDF a constant the edit distance should remain a factor
in the rankings so I would have thought this would give you what you
need.
Can you supply a more detailed example? Either print the rewritten
query or use the explain function
Cheers
Mark
On 27 Aug 2009, at 13:22, Berkes Adam wrote:
Hi,
In our java project we uses a (slightly modifed) version of
FuzzyLikeThis query which
"For each source term the fuzzy variants are held in a BooleanQuery
with no coord factor (because
we are not looking for matches on multiple variants in any one doc).
Additionally, a specialized
TermQuery is used for variants and does not use that variant term's
IDF because this would favour rarer
terms eg misspellings. Instead, all variants use the same IDF
ranking (the one for the source query
term) and this is factored into the variant's boost. If the source
query term does not exist in the
index the average IDF of the variants is used."
In most cases it performs well but if there is short query term with
(as usual) big number of variants the exact matches will be stay
spreaded among the others which is not so useful: it should be
"sorted" like (or forcibly set more relevant) exact matches and
variant matches according to relevancy.
Is there any simple solution or already implemented contrib query
class for this problem?
Best regards,
Adam Berkes,
Intland Software
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org