On 28/01/2012 11:22, Uwe Schindler wrote:
-----Original Message-----
From: Paul Taylor [mailto:paul_t...@fastmail.fm]
Sent: Saturday, January 28, 2012 10:33 AM
To: 'java-user@lucene.apache.org'
Subject: Does Fuzzy Search scores the same as Exact Match
All things being equal does a fuzzy match give the same score as an
exact match.
i.e if I do a search for farmin and it matches two docs one on term
farmin, the
other on term farming, will it score farming higher or score both
the same
?
YES, depends on the Fuzzy configuration (rewrite method,...), but
the default does so!
Uwe
So how do I change it, seems like a funny default to have.
Maybe I was not clear, it should score "farming" higher than "farmin" by
default, but the default rewrite mode also takes TF/IDF into account (in
addition).
Maybe there was some confusion in your original question, to make it clear:
If you search for "farming", "farming" (exact match) should score higher
than "farmin" (distance 1). With default rewrite mode this is correct for
boosting, but if a typo is more unlikely in the corpus, then based on TF-IDF
the score can still be different. You can prohibit that by using the right
rewrite mode that *only* takes levensthein distance as inverse boost and not
use TF-IDF => http://goo.gl/0eJ47
Thanks, if I understand you correctly you are saying that using this
different rewrite mode for subclasses of MultiTermQuery is better
because its silly to consider the term frequency of the matching term in
the doc for these and the uniqueness of the term in the index for these
types of queries because the fuzzy query may well have matched a term
the user wasn't trying to match, or is it because the levenstein
distance is just much more important and you don't want that diluted by
the tf/idf part.
Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org