Thanks a lot!
I now better understand how IDF in ES works, as you said, it is caused by 
sharding. After I added enough documents, I do see changes on IDF value as 
well as docFreq and maxDocs in output.


On Wednesday, March 11, 2015 at 9:54:13 AM UTC+8, Doug Turnbull wrote:

> A couple of things are going on here
>
> First read "Why is Relevance Broken". You're IDF might not be changing due 
> to sharding.
>
> https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-is-broken.html
>
> Second
> docFreq reflects this terms actual document frequency (how many documents 
> does the term occur in)
> maxDocs reflects the total number of documents on this shard
>
> Third
> maxDocs (and docFreq) do not reflect deletions. 
>
> Lastly,
> I presume you can find the documents you think you're adding in the index?
>
> Hope that helps
> -Doug
>
> On Tue, Mar 10, 2015 at 9:46 PM, Xudong You <[email protected] 
> <javascript:>> wrote:
>
>> Thanks!
>> I tried the explain and better understand how the score comes. But still 
>> has question on the IDF score, the IDF in the explain output of my query is:
>> {
>>   "value": 0.30685282,
>>   "description": "idf(docFreq=1, maxDocs=1)"
>> }
>>
>> What does docFreq and maxDocs in above mean? Per the IDF definition, the 
>> score should be affected by the total number of documents in the index, but 
>> seems the value is always 0.30685282 no matter how many docs I inserted to 
>> the index.
>>
>>
>> On Tuesday, March 10, 2015 at 5:39:56 PM UTC+8, Nhật Quang Phan wrote:
>>
>>>
>>> You can enable explain for your query and see how elasticsearch 
>>> calculates score:
>>>
>>> {
>>> "explain": true,
>>> "query": {
>>> "match": {
>>> "title": "xbox"
>>> }
>>> }
>>> }
>>>
>>> On Tuesday, March 10, 2015 at 3:15:50 PM UTC+7, Xudong You wrote:
>>>>
>>>> I have two documents as follows:
>>>>
>>>> 1.
>>>> {
>>>> "title":"xbox"
>>>> }
>>>>
>>>> 2.
>>>> {
>>>> "title":"xbox xbox xbox"
>>>> }
>>>>
>>>> Then I search the documents with following query:
>>>> {
>>>> "query":{"match":{"title":"xbox"}}
>>>> }
>>>>
>>>> ES returns result as follows:
>>>> {"took":133,"timed_out":false,"_shards":{"total":5,"
>>>> successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
>>>> "hits":[
>>>> {"_index":"storetest1","_type":"type","_id":"1","_score":0.
>>>> 30685282,"_source":{"title":"xbox","keywords":["xbox"]}},
>>>>
>>>> {"_index":"storetest1","_type":"type","_id":"2","_score":0.
>>>> 26574233,"_source":{"title":"xbox xbox xbox","keywords":["xbox"]}}]}}
>>>>
>>>>
>>>> My question is, why #1 got higher score than #2? I thought #2 is higher 
>>>> than #1, since more xbox appear in title of #1.
>>>>
>>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Doug Turnbull
> Search Relevance Lead
> OpenSource Connections <http://o19s.com>
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2e18efff-0b66-41b0-98e6-1eb73bde6896%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to