Re: MultiMatch with phrase_prefix support for boost ?

chee hoo lum Thu, 10 Apr 2014 03:24:08 -0700

Hi Dan,

Been trying to use edgengram analyzer but i don't understand why it produce
the result in the following order :


1) "DISPLAY_NAME": "Happy Talk"
2)  "DISPLAY_NAME": "Happenings",

I suppose the longer the text ("Happy Talk") it should have lower the
scores ?


*The query that i use : *
{   "explain" : true,
    "query" : {
     "multi_match": {
      "query": "*hap*",
      "fields": [ "DISPLAY_NAME.AUTOCOMPLETE^6", "PERFORMER.AUTOCOMPLETE" ]
     }
    },
    "sort" : [{
      "_score" : { "order" : "desc"}}]
}

*The result :*
https://gist.github.com/cheehoo/10365281


*The mapping :*

 "DISPLAY_NAME": {
                "type": "multi_field",
                "fields": {
                    "AUTOCOMPLETE": {
                        "type": "string",
                        "analyzer": "edge_ngram_keyword_lowercase_analyzer",
                        "include_in_all": false
                    },
                    "NAME": {
                        "type": "string",
                        "analyzer": "standard",
                        "include_in_all": false
                    }
                }
            },
 "PERFORMER": {
                "type": "multi_field",
                "fields": {
                    "AUTOCOMPLETE": {
                        "type": "string",
                        "analyzer": "edge_ngram_keyword_lowercase_analyzer",
                        "include_in_all": false
                    },
                    "NAME": {
                        "type": "string",
                        "analyzer": "standard",
                        "include_in_all": false
                    }
                }
            },


*And the analyzer settings: *
{
    "jdbc_dev": {
        "settings": {
            "index.analysis.analyzer.string_lowercase.filter": "lowercase",
            "index.number_of_replicas": "1",
            "index.version.created": "900299",
            "index.number_of_shards": "5",
            "index.analysis.analyzer.string_lowercase.tokenizer": "keyword",

"index.analysis.tokenizer.my_edge_ngram_tokenizer.token_chars.0": "letter",
            "index.analysis.tokenizer.my_edge_ngram_tokenizer.min_gram":
"2",

"index.analysis.tokenizer.my_edge_ngram_tokenizer.token_chars.1": "digit",
            "index.analysis.tokenizer.my_edge_ngram_tokenizer.type":
"edgeNGram",
            "index.analysis.tokenizer.my_edge_ngram_tokenizer.max_gram":
"5",

"index.analysis.analyzer.edge_ngram_keyword_lowercase_analyzer.filter.1":
"lowercase",

"index.analysis.analyzer.edge_ngram_keyword_lowercase_analyzer.filter.0":
"my_edge_ngram_filter",

"index.analysis.analyzer.edge_ngram_keyword_lowercase_analyzer.tokenizer":
"keyword",
            "index.analysis.filter.my_edge_ngram_filter.type": "edgeNGram",
            "index.analysis.filter.my_edge_ngram_filter.min_gram": "2",
            "index.analysis.filter.my_edge_ngram_filter.max_gram": "30"
        }
    }
}



Thanks.



On Thu, Apr 10, 2014 at 3:24 AM, Dan Tuffery <[email protected]> wrote:

> You could use edge ngrams:
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html
>
> Or if you have the correct version of ElasticSearch, you can use the
> Completion Suggester:
>
> http://www.elasticsearch.org/blog/you-complete-me/
>
> Dan
>
>
>
> On Wednesday, April 9, 2014 8:53:21 AM UTC+1, cyrilforce wrote:
>
>> Hi Dan,
>>
>> Good to know that. So what do you suggest if i wanted to have behavior
>> like when i type (search perform in multiple fields eg : DISPLAY_NAME and
>> PERFORMER)
>>
>> *Hel*
>>
>> Then it returns (in the following order) :
>> *Hel*l
>> *Hel*p
>> *Hel*lo
>> *Hel*lo World
>>
>>
>>
>> On Wed, Apr 9, 2014 at 3:45 PM, Dan Tuffery <[email protected]> wrote:
>>
>>> The boost you define in the 'multi_match' query is not being show in the
>>> explain results, so it is not being applied to the score. It should be
>>> displayed in the weight, i.e.
>>>
>>> "description": "weight(DISPLAY_NAME^8:happy in 33593)
>>> [PerFieldSimilarity], result of:"
>>>
>>> The 'phrase_prefix' type is the issue, if you remove that type the boost
>>> will be applied. So it doesn't look like you can combine 'multi_match'
>>> boosting with the 'phrase_prefix' as the type.
>>>
>>> Dan
>>>
>>>
>>> On Tuesday, April 8, 2014 5:15:38 PM UTC+1, cyrilforce wrote:
>>>
>>>> Hi Dan,
>>>>
>>>> I have enabled an analyzer in mapping due to stopword :
>>>>
>>>>  "*DISPLAY_NAME*": {
>>>>                 "type": "string",
>>>>                 "*analyzer*": "*standard*"
>>>>    }
>>>>
>>>>
>>>>
>>>> *The query : *
>>>>
>>>>  "multi_match" : {
>>>>           "query" : "*happy*",
>>>>           "fields" : [ *"DISPLAY_NAME^8*", "*PERFORMER*" ],
>>>>           "type":   "*phrase_prefix*",
>>>>           "operator" : "AND"
>>>>         }
>>>>
>>>>
>>>> *The result returned with explain enabled :*
>>>> *https://gist.github.com/cheehoo/10149517
>>>> <https://gist.github.com/cheehoo/10149517>*
>>>>
>>>>
>>>>  Thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Apr 8, 2014 at 8:44 PM, Dan <[email protected]> wrote:
>>>>
>>>>>  Turn on the explain feature to see why  example 4 is not getting a
>>>>> higher score.
>>>>>
>>>>> http://www.elasticsearch.org/guide/en/elasticsearch/referenc
>>>>> e/current/search-explain.html
>>>>>
>>>>> I suspect it has something to do with the way you are indexing your
>>>>> data. If you still have issues a gist would help us.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On Tuesday, April 8, 2014 1:11:46 PM UTC+1, cyrilforce wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have a query as below :
>>>>>>
>>>>>>  "multi_match" : {
>>>>>>           "query" : "*happy*",
>>>>>>           "fields" : [ *"DISPLAY_NAME^8*", "*PERFORMER*" ],
>>>>>>           "type":   "*phrase_prefix*",
>>>>>>           "operator" : "AND"
>>>>>>         }
>>>>>>
>>>>>>
>>>>>> Result return in the following order:
>>>>>> 1)
>>>>>>  "_score": 2.1704028,
>>>>>>                 "_source": {
>>>>>>                     "DISPLAY_NAME": "*Happy*man",
>>>>>>
>>>>>> 2)
>>>>>>  "_score": 1.4312989,
>>>>>>                 "_source": {
>>>>>>                     "DISPLAY_NAME": "Boishakh (Version 1)",
>>>>>>                    "PERFORMER": "*Happy*",
>>>>>>
>>>>>> 3)
>>>>>>  "_score": 1.2510761,
>>>>>>                 "_source": {
>>>>>>                     "DISPLAY_NAME": "Franzl Im Happysound",
>>>>>>                    "PERFORMER": "Franzl & Die Psayrer",
>>>>>>
>>>>>> 4)
>>>>>>      "_score": 1.0920545,
>>>>>>                 "_source": {
>>>>>>                     "DISPLAY_NAME": "*Happy*",
>>>>>>                "PERFORMER": "Diandra Arjunaidi"
>>>>>>
>>>>>>
>>>>>> As the result shown why the #4 have lesser score computed as i
>>>>>> already added a boost in the "DISPLAY_NAME^6" field. Is that the boost 
>>>>>> not
>>>>>> working for multimatch phrase query ?
>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>> You received this message because you are subscribed to a topic in the
>>>>> Google Groups "elasticsearch" group.
>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>>>>> pic/elasticsearch/yEAJ0Ym8PrU/unsubscribe.
>>>>>  To unsubscribe from this group and all its topics, send an email to
>>>>> [email protected].
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/bfd38ce9-13b0-4d25-912f-28036992a6df%40goo
>>>>> glegroups.com<https://groups.google.com/d/msgid/elasticsearch/bfd38ce9-13b0-4d25-912f-28036992a6df%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Chee Hoo
>>>>
>>>  --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "elasticsearch" group.
>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>> topic/elasticsearch/yEAJ0Ym8PrU/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/96218594-90b7-43f4-a0f5-497da059c2f9%
>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/96218594-90b7-43f4-a0f5-497da059c2f9%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Chee Hoo
>>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/yEAJ0Ym8PrU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/608341a2-0d83-48cc-9126-40499553c58b%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/608341a2-0d83-48cc-9126-40499553c58b%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Regards,

Chee Hoo

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg8hzpscJ%2Bj8tmZVGuc-mSy6PKT5O5cQYtBwKtjPQ6Or7g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: MultiMatch with phrase_prefix support for boost ?

Reply via email to