You are correct in your analysis of the fuzzy scoring. Fuzzy variants are
scored (relatively) the same as the exact match, because they are treated
the same when executed internally.
If you want to score exact matches higher, I would use a boolean
combination of an exact match and a fuzzy match. Semi-pseudo-query here:
{
"query": {
"bool": {
"should": [
{
"match" : {
"my_field" : {
"query" : "car renting london",
"operator" : "and"
},
"boost" : 2
}
},
{
"fuzzy_like_this": {}
}
]
}
}
}
Basically, the match query is set to AND operator (so all terms are
required) and it is given a boost of 2. That means that exact matches will
be boosted preferentially over the fuzzy matches, which will have the
default boost of 1.
Also I get results with more terms getting the same score, like "cheap car
> renting London", "offers car renting London".
>
The reason you are seeing results like this is because you are using the
fuzzy_like_this query. It's a combination of more_like_this and fuzzy.
The way MLT works is that it takes all the individual terms in your query,
builds a big boolean and searches the index for the boolean. Docs just
need the terms, in no particular order. The Fuzzy Like This works the
same, except terms are allowed to fuzzily match. With MLT and FLT, you're
bound to find "off-target" results because these queries are sorta like
shotguns, looking for a wide spread of terms.
*2) fuzzy query*
>
> That doesn't make what I want since it does not analyze the query (I
> think) and so it will treat the query in an unexpected way for my purposes
> of "free text" search
>
As an alternative, you can use the Match query and set the "fuzziness"
parameter. You'll get fuzzy like the fuzzy query, but analysis from the
Match query.
As a general comment, trying to deal with misspellings and fuzziness is
always a game between precision (number of returned results that are
correct) and recall (number of correct results that are returned). As you
increase fuzziness, you increase recall -- more of your correct results are
in your search hits...but you lose precision...they may be at position 200.
You'll always be battling the precision/recall fight.
I would instead search for exact matches, and prompt user to fix
mispellings with suggesters. This makes your search and relevancy *vastly*
simpler,
and tends to provide a better user experience because they can just click
the as-you-type suggestion or the "Did you mean?" link. Win win for
everyone.
-Zach
On Thursday, March 20, 2014 4:46:49 AM UTC-5, Adrian Luna wrote:
>
> Hi,
>
> Sorry that I am relatively fresh to elasticsearch so please don't be too
> harsh.
>
> I feel like I'm not being able to understand the behaviour of any of the
> fuzzy queries in ES.
>
> *1) match with fuzziness enabled*
>
> {
> "query": {
> "fuzzy_like_this_field": {
> "field_name": {
> "like_text": "car renting London",
> "fuzziness": "0.5"
> }
> }
> }
> }
>
> As I see it from my tests, this kind of query will give same score to
> documents with field_name="car renting London" and "car ranting London" or
> "car renting Londen" for example. That means, it will not give any
> negatively score misspellings. I can imagine that first the possible
> variants are computed and then the score is just computed with a
> "representative score" which is the same for every variant that match the
> requirements.
>
> Am I right? If I am, is it any way to boost the exact match over the fuzzy
> match?
>
> Also I get results with more terms getting the same score, like "cheap car
> renting London", "offers car renting London". That's something I cannot get
> to understand. When I use the explain API, it seems that the resulting
> score is a sum of the different matches with its internal weightings,
> tf-idf, etc. but it seems to not be considering the terms outside the
> query, while I would expect the exact match to score at least slightly
> higher.
>
> Am I missing something here? Is it just the expected result and I am just
> being too demanding?
>
> *2) fuzzy query*
>
> That doesn't make what I want since it does not analyze the query (I
> think) and so it will treat the query in an unexpected way for my purposes
> of "free text" search
>
> *3) fuzzy_like_this or fuzzy_like_this_field*
>
> This other search takes rid of the first problem in point 1, since as I
> read from the documentation, it seems to use some tricks to avoid favouring
> rare terms (misspellings will be here) over more frequent terms, etc. but
> it's still giving the same score to exact match and matches where other
> terms are present.
>
> Is there any way to get the expected behaviour?. By this I mean to be able
> to execute almost free-text queries with some fuzziness to take rid of
> possible misspellings in the query terms, but with an (at least for me)
> more exhaustive score computation. If not, is there any other more complex
> query or a function_score to get such a performance.
>
> Thank you very much, any comment will be pretty much appreciated. Also, if
> I am not right in my suppositions, any clarification will be very welcome.
>
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a8e3e438-9d27-449f-81c2-b50907dcd184%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.