I've loaded the same dataset in ES1.0.0.Beta2 with the same index 
configuration as in the topic start.

However now the numbers are consistent if I call the same aggregation 
multiple times in a row AND the number match the numbers of the facets. 
This leads me to the conclusion something is broken from Beta2 to RC1!

I would like to test this on master, but I could not find any nightly 
builds of elasticsearch. Is there a location where they are stored or 
should I compile it myself?

On Friday, January 31, 2014 6:43:07 PM UTC+1, Nils Dijk wrote:
>
> Hi Binh Ly,
>
> Thanks for the response.
>
> I'm aware that the numbers are not exact (hence the link to issue #1305 in 
> my initial post), and have been advocating slightly incorrect numbers with 
> my colleges and customers for some time already to prepare them for the 
> moment we provide analytics with ES. But what bothers me is that they are 
> *inconsistent*.
>
> If you look at my gist you see that I ran the same aggs 3 times right 
> after each other. If we just look at the top item we see the following 
> results:
>
>    1. { "key": "totaltrafficbos", "doc_count": 2880 }
>    2. { "key": "totaltrafficbos", "doc_count": 2552 }
>    3. { "key": "totaltrafficbos", "doc_count": 2179 }
>    
> These results are taken within seconds without any change to the number of 
> documents in the index. If I run them even more you see that it rotates 
> between a hand full of numbers. Is this also behavior one would expect from 
> the aggs? And if so, why do the facets show the same number over and over 
> again?
>
> Anyway, I will try to work myself through the aggs code this weekend to get a 
> better hang of what we could do with it, and what not.
>
> -- Nils
>
> On Friday, January 31, 2014 6:18:43 PM UTC+1, Binh Ly wrote:
>>
>> Nils,
>>
>> This is just the nature of splitting data around in shards. Actually the 
>> terms facet has the same limitations (i.e. it will also give "approximate 
>> counts"). Neither the terms facet nor the terms aggregation is better or 
>> worse than the other - they are both approximations (using different 
>> implementations). It is correct that if you put all your data in 1 shard, 
>> then all the counts are exact. If you need to shard, you can increase the 
>> "shard_size" parameter inside the terms aggregation to "improve accuracy". 
>> Play with that number until it suits your purposes but the important thing 
>> is they are just approximations the more documents you have in the index - 
>> so just don't expect absolute numbers from them if you have more than 1 
>> shard.
>>
>> {
>>   "size": 0,
>>   "aggs": {
>>     "a": {
>>       "terms": {
>>         "field": "actor.displayName",
>>         "shard_size": 10000
>>       }
>>     }
>>   }
>> }
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6bee2ff8-ae78-4837-91f5-77ee80f55d34%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to