Sorry, but your file at  https://gist.github.com/8803745.git is broken, it
contains invalid JSON, so it can not be processed.

It would be helpful to provide a script with escaped JSON in bulk format.

>From what I suspect, you do not use keyword analyzer for faceting/agg'ing,
so you will get all kinds of unwanted results. If that explains your
fluctuating aggs results, I can not tell. It is rather uncommon to use
"facets" and "aggs" side by side.

Jörg



On Tue, Feb 4, 2014 at 3:01 PM, Nils Dijk <[email protected]> wrote:

> To follow up,
>
> I have a contained test suite at https://gist.github.com/thanodnl/8803745for 
> this problem. It contains two files:
>
>    1. aggsbug.sh
>    2. aggsbug.json
>
> The .json file contains ~1M documents newline separated to load into the
> database, I was not able to create a curl request to load them directly
> into the index.
> The .sh file (https://gist.github.com/thanodnl/8803745/raw/aggsbug.sh)
> contains the instructions for recreating this behavior.
>
> I have ran these against the following version:
>
>    1. 1.0.0.Beta2
>    2. 1.0.0.RC1
>    3. 1.0.0-SNAPSHOT as compiled from the git 1.0 branch on commit
>    0f8b41ffad9b5ecdfd543d7c73edcf404e6fc763
>
> When ran on 1.0.0.Beta2 it gives the same output consistently when I run
> the _search over and over again.
> When ran on 1.0.0.RC1 it will give me multiple different outcomes
> comparable to the numbers I posted earlier in the thread,
> When ran on 1.0.0-SNAPSHOT it behaves the same as in 1.0.0.RC1.
>
> That it still was working on 1.0.0.Beta2 proves to me that it is a bug
> that got into RC1. I could not find any related ticket on the issues page
> of the github repository. Hopefully this is enough information to recreate
> the problem.
>
> The json file is quite big and could bug when you open the gist it in a
> browser. A clone of the gist locally will work best:
> $ git clone https://gist.github.com/8803745.git
>
> I do not really know how to move on from here. Do you want me to open an
> issue for this problem at github.com/elasticsearch/elasticsearch? It
> would be nice to fix this problem before a release of 1.0.0 since that is
> the first release containing the aggregations for analytics.
>
> On Tuesday, February 4, 2014 12:31:10 PM UTC+1, Nils Dijk wrote:
>
>> I've loaded the same dataset in ES1.0.0.Beta2 with the same index
>> configuration as in the topic start.
>>
>> However now the numbers are consistent if I call the same aggregation
>> multiple times in a row AND the number match the numbers of the facets.
>> This leads me to the conclusion something is broken from Beta2 to RC1!
>>
>> I would like to test this on master, but I could not find any nightly
>> builds of elasticsearch. Is there a location where they are stored or
>> should I compile it myself?
>>
>> On Friday, January 31, 2014 6:43:07 PM UTC+1, Nils Dijk wrote:
>>>
>>> Hi Binh Ly,
>>>
>>> Thanks for the response.
>>>
>>> I'm aware that the numbers are not exact (hence the link to issue #1305
>>> in my initial post), and have been advocating slightly incorrect numbers
>>> with my colleges and customers for some time already to prepare them for
>>> the moment we provide analytics with ES. But what bothers me is that they
>>> are *inconsistent*.
>>>
>>> If you look at my gist you see that I ran the same aggs 3 times right
>>> after each other. If we just look at the top item we see the following
>>> results:
>>>
>>>    1. { "key": "totaltrafficbos", "doc_count": 2880 }
>>>    2. { "key": "totaltrafficbos", "doc_count": 2552 }
>>>    3. { "key": "totaltrafficbos", "doc_count": 2179 }
>>>
>>> These results are taken within seconds without any change to the number of 
>>> documents in the index. If I run them even more you see that it rotates 
>>> between a hand full of numbers. Is this also behavior one would expect from 
>>> the aggs? And if so, why do the facets show the same number over and over 
>>> again?
>>>
>>> Anyway, I will try to work myself through the aggs code this weekend to get 
>>> a better hang of what we could do with it, and what not.
>>>
>>> -- Nils
>>>
>>> On Friday, January 31, 2014 6:18:43 PM UTC+1, Binh Ly wrote:
>>>>
>>>> Nils,
>>>>
>>>> This is just the nature of splitting data around in shards. Actually
>>>> the terms facet has the same limitations (i.e. it will also give
>>>> "approximate counts"). Neither the terms facet nor the terms aggregation is
>>>> better or worse than the other - they are both approximations (using
>>>> different implementations). It is correct that if you put all your data in
>>>> 1 shard, then all the counts are exact. If you need to shard, you can
>>>> increase the "shard_size" parameter inside the terms aggregation to
>>>> "improve accuracy". Play with that number until it suits your purposes but
>>>> the important thing is they are just approximations the more documents you
>>>> have in the index - so just don't expect absolute numbers from them if you
>>>> have more than 1 shard.
>>>>
>>>> {
>>>>   "size": 0,
>>>>   "aggs": {
>>>>     "a": {
>>>>       "terms": {
>>>>         "field": "actor.displayName",
>>>>         "shard_size": 10000
>>>>       }
>>>>     }
>>>>   }
>>>> }
>>>>
>>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/fb421a29-8923-4188-9363-03682fec71ab%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEMMy4mkHPYhJYpsOwY-2TdHtS9vAS0Enu0U93jfkEFwQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to