I've loaded the same dataset in ES1.0.0.Beta2 with the same index
configuration as in the topic start.
However now the numbers are consistent if I call the same aggregation
multiple times in a row AND the number match the numbers of the facets.
This leads me to the conclusion something is broken from Beta2 to RC1!
I would like to test this on master, but I could not find any nightly
builds of elasticsearch. Is there a location where they are stored or
should I compile it myself?
On Friday, January 31, 2014 6:43:07 PM UTC+1, Nils Dijk wrote:
>
> Hi Binh Ly,
>
> Thanks for the response.
>
> I'm aware that the numbers are not exact (hence the link to issue #1305 in
> my initial post), and have been advocating slightly incorrect numbers with
> my colleges and customers for some time already to prepare them for the
> moment we provide analytics with ES. But what bothers me is that they are
> *inconsistent*.
>
> If you look at my gist you see that I ran the same aggs 3 times right
> after each other. If we just look at the top item we see the following
> results:
>
> 1. { "key": "totaltrafficbos", "doc_count": 2880 }
> 2. { "key": "totaltrafficbos", "doc_count": 2552 }
> 3. { "key": "totaltrafficbos", "doc_count": 2179 }
>
> These results are taken within seconds without any change to the number of
> documents in the index. If I run them even more you see that it rotates
> between a hand full of numbers. Is this also behavior one would expect from
> the aggs? And if so, why do the facets show the same number over and over
> again?
>
> Anyway, I will try to work myself through the aggs code this weekend to get a
> better hang of what we could do with it, and what not.
>
> -- Nils
>
> On Friday, January 31, 2014 6:18:43 PM UTC+1, Binh Ly wrote:
>>
>> Nils,
>>
>> This is just the nature of splitting data around in shards. Actually the
>> terms facet has the same limitations (i.e. it will also give "approximate
>> counts"). Neither the terms facet nor the terms aggregation is better or
>> worse than the other - they are both approximations (using different
>> implementations). It is correct that if you put all your data in 1 shard,
>> then all the counts are exact. If you need to shard, you can increase the
>> "shard_size" parameter inside the terms aggregation to "improve accuracy".
>> Play with that number until it suits your purposes but the important thing
>> is they are just approximations the more documents you have in the index -
>> so just don't expect absolute numbers from them if you have more than 1
>> shard.
>>
>> {
>> "size": 0,
>> "aggs": {
>> "a": {
>> "terms": {
>> "field": "actor.displayName",
>> "shard_size": 10000
>> }
>> }
>> }
>> }
>>
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6bee2ff8-ae78-4837-91f5-77ee80f55d34%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.