Re: how scoring computed in wildcard and prefix query

chee hoo lum Thu, 10 Apr 2014 23:09:20 -0700

Sorry one more point i would like to know. Is there anyway to disable
rarity of the text (docFreq) yield higher scores ?



On Fri, Apr 11, 2014 at 1:52 PM, chee hoo lum <[email protected]> wrote:

> Thanks Dan.
>
> How about queryWeight ? It only computed when using wildcard or prefix.
> Also when you say " The term '*Hap*penings' appears four times in your
> index" is regardless on which field right ?
>
> Thanks
>
>
> On Fri, Apr 11, 2014 at 1:25 PM, Dan Tuffery <[email protected]>wrote:
>
>> The scoring is computed using the Lucene scoring:
>>
>>
>> https://lucene.apache.org/core/3_6_2/api/all/org/apache/lucene/search/Similarity.html
>>
>> the *idf* is the inverse document frequency, which gives a higher score
>> to the rarer terms in the index. The term '*Hap*penings' appears four
>> times in your index, the term 'Happier' in appears twice your index
>> therefore it has a higher score for idf.
>>
>> Dan
>>
>> On Friday, April 11, 2014 4:45:58 AM UTC+1, cyrilforce wrote:
>>>
>>> Hi,
>>>
>>> I have a question on how the scoring being computed on the following
>>> query :
>>>
>>> {
>>>   "from" : 0,
>>>   "size" : 60,
>>>   "explain" : true,
>>>   "track_scores" : true,
>>>   "query" : {
>>>         "bool" : {
>>>                 "should" : [
>>>                   { "prefix": { "DISPLAY_NAME" : { "value" : "*hap*",
>>> "rewrite" : "top_terms_10", "boost" : "3.0" }}},
>>>                   { "prefix": {"PERFORMER" :{ "value" : "*hap*" }}}
>>>
>>>                 ]
>>>         }
>>>    }
>>>  }
>>>
>>> and it produces result :
>>>
>>> 1)
>>>  "DISPLAY_NAME": "*Happier?*",
>>> , "_explanation": {
>>>                     "value": *2.7100196*,
>>>                     "description": "product of:",
>>>                     "details": [
>>>                         {
>>>                             "value": 5.420039,
>>>                             "description": "sum of:",
>>>                             "details": [
>>>                                 {
>>>                                     "value": 5.420039,
>>>                                     "description": "sum of:",
>>>                                     "details": [
>>>                                         {
>>>                                             "value": 5.420039,
>>>                                             "description":
>>> "weight(DISPLAY_NAME:happier^3.0 in 32661) [PerFieldSimilarity], result
>>> of:",
>>>                                             "details": [
>>>                                                 {
>>>                                                     "value": 5.420039,
>>>                                                     "description":
>>> "score(doc=32661,freq=1.0 = termFreq=1.0\n), product of:",
>>>                                                     "details": [
>>>                                                         {
>>>                                                             "value":
>>> 0.34746242,
>>>
>>> "description": "queryWeight, product of:",
>>>                                                             "details": [
>>>                                                                 {
>>>
>>> "value": 3,
>>>
>>> "description": "boost"
>>>                                                                 },
>>>                                                                 {
>>>                                                                  *
>>> "value": 15.598923,*
>>> *
>>> "description": "idf(docFreq=2, maxDocs=6566786)"*
>>>                                                                 },
>>>                                                                 {
>>>                                                                    *
>>> "value": 0.0074249236,*
>>> *
>>> "description": "queryNorm"*
>>>                                                                 }
>>>                                                             ]
>>>                                                         },
>>>                                                         {
>>>                                                             "value":
>>> 15.598923,
>>>
>>> "description": "fieldWeight in 32661, product of:",
>>>                                                             "details": [
>>>                                                                 {
>>>
>>> "value": 1,
>>>
>>> "description": "tf(freq=1.0), with freq of:",
>>>
>>> "details": [
>>>                                                                         {
>>>
>>>     "value": 1,
>>>
>>>     "description": "termFreq=1.0"
>>>                                                                         }
>>>                                                                     ]
>>>                                                                 },
>>>                                                                 {
>>>                                                                  *
>>> "value": 15.598923,*
>>> *
>>> "description": "idf(docFreq=2, maxDocs=6566786)"*
>>>                                                                 },
>>>                                                                 {
>>>
>>> "value": 1,
>>>
>>> "description": "fieldNorm(doc=32661)"
>>>                                                                 }
>>>                                                             ]
>>>                                                         }
>>>                                                     ]
>>>                                                 }
>>>                                             ]
>>>                                         }
>>>                                     ]
>>>                                 }
>>>                             ]
>>>                         },
>>>                         {
>>>                             "value": 0.5,
>>>                             "description": "coord(1/2)"
>>>                         }
>>>
>>>
>>> 2)
>>> "DISPLAY_NAME": *"Hap*penings",
>>> ,
>>>                 "_explanation": {
>>>                     "value": *2.5354335*,
>>>                     "description": "product of:",
>>>                     "details": [
>>>                         {
>>>                             "value": 5.070867,
>>>                             "description": "sum of:",
>>>                             "details": [
>>>                                 {
>>>                                     "value": 5.070867,
>>>                                     "description": "sum of:",
>>>                                     "details": [
>>>                                         {
>>>                                             "value": 5.070867,
>>>                                             "description":
>>> "weight(DISPLAY_NAME:happenings^3.0 in 23093) [PerFieldSimilarity],
>>> result of:",
>>>                                             "details": [
>>>                                                 {
>>>                                                     "value": 5.070867,
>>>                                                     "description":
>>> "score(doc=23093,freq=1.0 = termFreq=1.0\n), product of:",
>>>                                                     "details": [
>>>                                                         {
>>>                                                             "value":
>>> 0.33608392,
>>>
>>> "description": "*queryWeight*, product of:",
>>>                                                             "details": [
>>>                                                                 {
>>>
>>> "value": 3,
>>>
>>> "description": "boost"
>>>                                                                 },
>>>                                                                 {
>>>                                                                     
>>> *"value":
>>> 15.088098,*
>>> *
>>> "description": "idf(docFreq=4, maxDocs=6566786)"*
>>>                                                                 },
>>>                                                                 {
>>>                                                                 *
>>> "value": 0.0074249236,*
>>> *
>>> "description": "queryNorm"*
>>>                                                                 }
>>>                                                             ]
>>>                                                         },
>>>                                                         {
>>>                                                             "value":
>>> 15.088098,
>>>
>>> "description": "*fieldWeight *in 23093, product of:",
>>>                                                             "details": [
>>>                                                                 {
>>>
>>> "value": 1,
>>>
>>> "description": "tf(freq=1.0), with freq of:",
>>>
>>> "details": [
>>>                                                                         {
>>>
>>>     "value": 1,
>>>
>>>     "description": "termFreq=1.0"
>>>                                                                         }
>>>                                                                     ]
>>>                                                                 },
>>>                                                                 {
>>>                                                                 *
>>> "value": 15.088098,*
>>> *
>>> "description": "idf(docFreq=4, maxDocs=6566786)"*
>>>                                                                 },
>>>                                                                 {
>>>
>>> "value": 1,
>>>
>>> "description": "fieldNorm(doc=23093)"
>>>                                                                 }
>>>                                                             ]
>>>                                                         }
>>>                                                     ]
>>>                                                 }
>>>                                             ]
>>>                                         }
>>>                                     ]
>>>                                 }
>>>                             ]
>>>                         },
>>>                         {
>>>                             "value": 0.5,
>>>                             "description": "coord(1/2)"
>>>                         }
>>>                     ]
>>>                 }
>>>             }
>>>
>>>
>>> As both of the display name in the documents matched "*Hap*" it should
>>> have same scoring however it yields different scoring as shown above.
>>> Further inspection on the explaining i found
>>> out that the different is in the queryWeight->idf and fieldWeight->idf
>>> fields :
>>>
>>> 1) * "value": 15.598923,*
>>> *     "description": "idf(docFreq=2, maxDocs=6566786)"*
>>>
>>> *2)*    *"value": 15.088098,*
>>> *         "description": "idf(docFreq=4, maxDocs=6566786)"*
>>>
>>>
>>> I would like to know why the value is different and how this is being
>>> computed and what is docFreq ? Also i would like to know what is
>>> queryWeight as when i use wildcard and prefix query it only will computed
>>> the score with queryWeight otherwise only fieldWeight.
>>>
>>> I am using *&search_type=dfs_query_then_fetch&preference=_primary* in
>>> the query.
>>>
>>> And here is the gist for full result :
>>> https://gist.github.com/cheehoo/10439849
>>>
>>>
>>> Thanks.
>>>
>>>  --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/VKgbWgrgzSg/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/eb221209-ddc9-4257-9a5f-a1f11a39f088%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/eb221209-ddc9-4257-9a5f-a1f11a39f088%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Regards,
>
> Chee Hoo
>



-- 
Regards,

Chee Hoo

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg9_CagCGiBOLZfTa2yirkbDHiK7ZGNKe1zFc14kdVxUWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: how scoring computed in wildcard and prefix query

Reply via email to