Thanks a lot! I now better understand how IDF in ES works, as you said, it is caused by sharding. After I added enough documents, I do see changes on IDF value as well as docFreq and maxDocs in output.
On Wednesday, March 11, 2015 at 9:54:13 AM UTC+8, Doug Turnbull wrote: > A couple of things are going on here > > First read "Why is Relevance Broken". You're IDF might not be changing due > to sharding. > > https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-is-broken.html > > Second > docFreq reflects this terms actual document frequency (how many documents > does the term occur in) > maxDocs reflects the total number of documents on this shard > > Third > maxDocs (and docFreq) do not reflect deletions. > > Lastly, > I presume you can find the documents you think you're adding in the index? > > Hope that helps > -Doug > > On Tue, Mar 10, 2015 at 9:46 PM, Xudong You <[email protected] > <javascript:>> wrote: > >> Thanks! >> I tried the explain and better understand how the score comes. But still >> has question on the IDF score, the IDF in the explain output of my query is: >> { >> "value": 0.30685282, >> "description": "idf(docFreq=1, maxDocs=1)" >> } >> >> What does docFreq and maxDocs in above mean? Per the IDF definition, the >> score should be affected by the total number of documents in the index, but >> seems the value is always 0.30685282 no matter how many docs I inserted to >> the index. >> >> >> On Tuesday, March 10, 2015 at 5:39:56 PM UTC+8, Nhật Quang Phan wrote: >> >>> >>> You can enable explain for your query and see how elasticsearch >>> calculates score: >>> >>> { >>> "explain": true, >>> "query": { >>> "match": { >>> "title": "xbox" >>> } >>> } >>> } >>> >>> On Tuesday, March 10, 2015 at 3:15:50 PM UTC+7, Xudong You wrote: >>>> >>>> I have two documents as follows: >>>> >>>> 1. >>>> { >>>> "title":"xbox" >>>> } >>>> >>>> 2. >>>> { >>>> "title":"xbox xbox xbox" >>>> } >>>> >>>> Then I search the documents with following query: >>>> { >>>> "query":{"match":{"title":"xbox"}} >>>> } >>>> >>>> ES returns result as follows: >>>> {"took":133,"timed_out":false,"_shards":{"total":5," >>>> successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282, >>>> "hits":[ >>>> {"_index":"storetest1","_type":"type","_id":"1","_score":0. >>>> 30685282,"_source":{"title":"xbox","keywords":["xbox"]}}, >>>> >>>> {"_index":"storetest1","_type":"type","_id":"2","_score":0. >>>> 26574233,"_source":{"title":"xbox xbox xbox","keywords":["xbox"]}}]}} >>>> >>>> >>>> My question is, why #1 got higher score than #2? I thought #2 is higher >>>> than #1, since more xbox appear in title of #1. >>>> >>>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Doug Turnbull > Search Relevance Lead > OpenSource Connections <http://o19s.com> > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2e18efff-0b66-41b0-98e6-1eb73bde6896%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
