Re: Fuzzy query scoring based on levenshtein distance

Octavian Wed, 29 Oct 2014 08:54:46 -0700

Hi Blab,

I also want to return a score based on levenshtein distance from a fuzzy 
query. Can you elaborate more on "writing a (native) script to handle the 
scoring.", please? Did you actually write a script that calculates the 
distance or did you use some ES properties?


Thank you,

On Thursday, March 14, 2013 7:52:02 PM UTC+2, the blab wrote:
>
> Thanks for your response. By "against the min_similarity" I meant the 
> minimum value for the similarity of the fuzzy terms, i.e. the 
> min_similarity parameter provided in the query I posted. 
>
> To clarify there are two "scores" being calculated in the query: the 
> "levenshtein distance" to determine what terms to use, and the actual 
> scoring of the returned results. I wanted the levenshtein distance to be 
> used to score the returned results, but I don't think this is possible.
>
> For future readers I solved this issue by creating a custom score query 
> and writing a (native) script to handle the scoring.
>
> Thanks
>
> On Thursday, March 14, 2013 7:02:03 AM UTC, simonw wrote:
>>
>> Hey,
>>
>> this is not entirely true. The FuzzyQuery uses the Levenshtein Distance 
>> to find the terms in the index that are subsequentially used in a Boolean 
>> OR query or in a ConstantScore Filter depending on the rewrite method you 
>> choose. The default also just takes the top 50 terms within a certain LD 
>> and then builds a query out of it. The scoring will just be the similarity 
>> of you scoring model so TF/IDF (VectorSpace) by default.
>>
>> I don't understand your last sentence, what do you mean by 'against the 
>> min_similarity'?
>>
>> simon
>>
>> On Tuesday, March 12, 2013 6:45:09 PM UTC+1, the blab wrote:
>>>
>>> Hi,
>>>
>>> I have a question about scoring for fuzzy queries. If I understand 
>>> correctly, fuzzy queries find any appropriate matches by calculating 
>>> similarity using the levenshtein distance, but this similarity value is not 
>>> used when calculating the document's score. Instead the document's score is 
>>> based on the tf/idf of the matched term. Is this correct? Is it possible to 
>>> instead score based on similarity to the queried term for fuzzy queries? 
>>> E.g. I have the below custom_score query. I'd like the score returned to be 
>>> the similarity score used to evaluate against the min_similarity.
>>>
>>> {
>>>     "query": {
>>> "custom_score" : {
>>>     "query": {
>>>         "fuzzy": {
>>>             "firstname": {
>>>                 "value": "Jack",
>>>                 "min_similarity": "0.5",
>>>                 "max_expansions": 1
>>>             }
>>>         }
>>>     },
>>> "script" : "_score"
>>> }
>>> }
>>> }
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1012a30f-cdc9-4170-8b3f-c83866e2425d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Fuzzy query scoring based on levenshtein distance

Reply via email to