fuzziness & score computation

Adrian Luna Thu, 20 Mar 2014 02:47:29 -0700

Hi, 

Sorry that I am relatively fresh to elasticsearch so please don't be too 
harsh.

I feel like I'm not being able to understand the behaviour of any of the
fuzzy queries in ES.

*1) match with fuzziness enabled*

{
"query": {
"fuzzy_like_this_field": {
"field_name": {
"like_text": "car renting London",
"fuzziness": "0.5"
}
}
}
}

As I see it from my tests, this kind of query will give same score to
documents with field_name="car renting London" and "car ranting London" or
"car renting Londen" for example. That means, it will not give any
negatively score misspellings. I can imagine that first the possible
variants are computed and then the score is just computed with a
"representative score" which is the same for every variant that match the
requirements.

Am I right? If I am, is it any way to boost the exact match over the fuzzy
match?

Also I get results with more terms getting the same score, like "cheap car
renting London", "offers car renting London". That's something I cannot get
to understand. When I use the explain API, it seems that the resulting
score is a sum of the different matches with its internal weightings,
tf-idf, etc. but it seems to not be considering the terms outside the
query, while I would expect the exact match to score at least slightly
higher.

Am I missing something here? Is it just the expected result and I am just
being too demanding?

*2) fuzzy query*

That doesn't make what I want since it does not analyze the query (I think)
and so it will treat the query in an unexpected way for my purposes of
"free text" search

*3) fuzzy_like_this or fuzzy_like_this_field*

This other search takes rid of the first problem in point 1, since as I
read from the documentation, it seems to use some tricks to avoid favouring
rare terms (misspellings will be here) over more frequent terms, etc. but
it's still giving the same score to exact match and matches where other
terms are present.

Is there any way to get the expected behaviour?. By this I mean to be able
to execute almost free-text queries with some fuzziness to take rid of
possible misspellings in the query terms, but with an (at least for me)
more exhaustive score computation. If not, is there any other more complex
query or a function_score to get such a performance.

Thank you very much, any comment will be pretty much appreciated. Also, if
I am not right in my suppositions, any clarification will be very welcome.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/916f5408-ecfd-4676-8d48-db4467a9d839%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

fuzziness & score computation

Reply via email to