Sorry one more point i would like to know. Is there anyway to disable rarity of the text (docFreq) yield higher scores ?
On Fri, Apr 11, 2014 at 1:52 PM, chee hoo lum <[email protected]> wrote: > Thanks Dan. > > How about queryWeight ? It only computed when using wildcard or prefix. > Also when you say " The term '*Hap*penings' appears four times in your > index" is regardless on which field right ? > > Thanks > > > On Fri, Apr 11, 2014 at 1:25 PM, Dan Tuffery <[email protected]>wrote: > >> The scoring is computed using the Lucene scoring: >> >> >> https://lucene.apache.org/core/3_6_2/api/all/org/apache/lucene/search/Similarity.html >> >> the *idf* is the inverse document frequency, which gives a higher score >> to the rarer terms in the index. The term '*Hap*penings' appears four >> times in your index, the term 'Happier' in appears twice your index >> therefore it has a higher score for idf. >> >> Dan >> >> On Friday, April 11, 2014 4:45:58 AM UTC+1, cyrilforce wrote: >>> >>> Hi, >>> >>> I have a question on how the scoring being computed on the following >>> query : >>> >>> { >>> "from" : 0, >>> "size" : 60, >>> "explain" : true, >>> "track_scores" : true, >>> "query" : { >>> "bool" : { >>> "should" : [ >>> { "prefix": { "DISPLAY_NAME" : { "value" : "*hap*", >>> "rewrite" : "top_terms_10", "boost" : "3.0" }}}, >>> { "prefix": {"PERFORMER" :{ "value" : "*hap*" }}} >>> >>> ] >>> } >>> } >>> } >>> >>> and it produces result : >>> >>> 1) >>> "DISPLAY_NAME": "*Happier?*", >>> , "_explanation": { >>> "value": *2.7100196*, >>> "description": "product of:", >>> "details": [ >>> { >>> "value": 5.420039, >>> "description": "sum of:", >>> "details": [ >>> { >>> "value": 5.420039, >>> "description": "sum of:", >>> "details": [ >>> { >>> "value": 5.420039, >>> "description": >>> "weight(DISPLAY_NAME:happier^3.0 in 32661) [PerFieldSimilarity], result >>> of:", >>> "details": [ >>> { >>> "value": 5.420039, >>> "description": >>> "score(doc=32661,freq=1.0 = termFreq=1.0\n), product of:", >>> "details": [ >>> { >>> "value": >>> 0.34746242, >>> >>> "description": "queryWeight, product of:", >>> "details": [ >>> { >>> >>> "value": 3, >>> >>> "description": "boost" >>> }, >>> { >>> * >>> "value": 15.598923,* >>> * >>> "description": "idf(docFreq=2, maxDocs=6566786)"* >>> }, >>> { >>> * >>> "value": 0.0074249236,* >>> * >>> "description": "queryNorm"* >>> } >>> ] >>> }, >>> { >>> "value": >>> 15.598923, >>> >>> "description": "fieldWeight in 32661, product of:", >>> "details": [ >>> { >>> >>> "value": 1, >>> >>> "description": "tf(freq=1.0), with freq of:", >>> >>> "details": [ >>> { >>> >>> "value": 1, >>> >>> "description": "termFreq=1.0" >>> } >>> ] >>> }, >>> { >>> * >>> "value": 15.598923,* >>> * >>> "description": "idf(docFreq=2, maxDocs=6566786)"* >>> }, >>> { >>> >>> "value": 1, >>> >>> "description": "fieldNorm(doc=32661)" >>> } >>> ] >>> } >>> ] >>> } >>> ] >>> } >>> ] >>> } >>> ] >>> }, >>> { >>> "value": 0.5, >>> "description": "coord(1/2)" >>> } >>> >>> >>> 2) >>> "DISPLAY_NAME": *"Hap*penings", >>> , >>> "_explanation": { >>> "value": *2.5354335*, >>> "description": "product of:", >>> "details": [ >>> { >>> "value": 5.070867, >>> "description": "sum of:", >>> "details": [ >>> { >>> "value": 5.070867, >>> "description": "sum of:", >>> "details": [ >>> { >>> "value": 5.070867, >>> "description": >>> "weight(DISPLAY_NAME:happenings^3.0 in 23093) [PerFieldSimilarity], >>> result of:", >>> "details": [ >>> { >>> "value": 5.070867, >>> "description": >>> "score(doc=23093,freq=1.0 = termFreq=1.0\n), product of:", >>> "details": [ >>> { >>> "value": >>> 0.33608392, >>> >>> "description": "*queryWeight*, product of:", >>> "details": [ >>> { >>> >>> "value": 3, >>> >>> "description": "boost" >>> }, >>> { >>> >>> *"value": >>> 15.088098,* >>> * >>> "description": "idf(docFreq=4, maxDocs=6566786)"* >>> }, >>> { >>> * >>> "value": 0.0074249236,* >>> * >>> "description": "queryNorm"* >>> } >>> ] >>> }, >>> { >>> "value": >>> 15.088098, >>> >>> "description": "*fieldWeight *in 23093, product of:", >>> "details": [ >>> { >>> >>> "value": 1, >>> >>> "description": "tf(freq=1.0), with freq of:", >>> >>> "details": [ >>> { >>> >>> "value": 1, >>> >>> "description": "termFreq=1.0" >>> } >>> ] >>> }, >>> { >>> * >>> "value": 15.088098,* >>> * >>> "description": "idf(docFreq=4, maxDocs=6566786)"* >>> }, >>> { >>> >>> "value": 1, >>> >>> "description": "fieldNorm(doc=23093)" >>> } >>> ] >>> } >>> ] >>> } >>> ] >>> } >>> ] >>> } >>> ] >>> }, >>> { >>> "value": 0.5, >>> "description": "coord(1/2)" >>> } >>> ] >>> } >>> } >>> >>> >>> As both of the display name in the documents matched "*Hap*" it should >>> have same scoring however it yields different scoring as shown above. >>> Further inspection on the explaining i found >>> out that the different is in the queryWeight->idf and fieldWeight->idf >>> fields : >>> >>> 1) * "value": 15.598923,* >>> * "description": "idf(docFreq=2, maxDocs=6566786)"* >>> >>> *2)* *"value": 15.088098,* >>> * "description": "idf(docFreq=4, maxDocs=6566786)"* >>> >>> >>> I would like to know why the value is different and how this is being >>> computed and what is docFreq ? Also i would like to know what is >>> queryWeight as when i use wildcard and prefix query it only will computed >>> the score with queryWeight otherwise only fieldWeight. >>> >>> I am using *&search_type=dfs_query_then_fetch&preference=_primary* in >>> the query. >>> >>> And here is the gist for full result : >>> https://gist.github.com/cheehoo/10439849 >>> >>> >>> Thanks. >>> >>> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "elasticsearch" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/elasticsearch/VKgbWgrgzSg/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/eb221209-ddc9-4257-9a5f-a1f11a39f088%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/eb221209-ddc9-4257-9a5f-a1f11a39f088%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Regards, > > Chee Hoo > -- Regards, Chee Hoo -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg9_CagCGiBOLZfTa2yirkbDHiK7ZGNKe1zFc14kdVxUWg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
