Hi,

I updated the gist now with a file in bulkindex format.
I also split up the loading from the testing phase, so you can do the test 
multiple times in a row.
I also added a README.md to instruct how to run the test.

I'm also creating a bug as stated here 
http://www.elasticsearch.org/blog/0-90-11-1-0-0-rc2-released/.

On Wednesday, February 5, 2014 9:49:40 AM UTC+1, Jörg Prante wrote:
>
> Sorry, but your file at  https://gist.github.com/8803745.git is broken, 
> it contains invalid JSON, so it can not be processed.
>
> It would be helpful to provide a script with escaped JSON in bulk format.
>
> From what I suspect, you do not use keyword analyzer for faceting/agg'ing, 
> so you will get all kinds of unwanted results. If that explains your 
> fluctuating aggs results, I can not tell. It is rather uncommon to use 
> "facets" and "aggs" side by side.
>
> Jörg
>
>
>
> On Tue, Feb 4, 2014 at 3:01 PM, Nils Dijk <[email protected] <javascript:>>wrote:
>
>> To follow up,
>>
>> I have a contained test suite at https://gist.github.com/thanodnl/8803745for 
>> this problem. It contains two files:
>>
>>    1. aggsbug.sh
>>    2. aggsbug.json
>>
>> The .json file contains ~1M documents newline separated to load into the 
>> database, I was not able to create a curl request to load them directly 
>> into the index.
>> The .sh file (https://gist.github.com/thanodnl/8803745/raw/aggsbug.sh) 
>> contains the instructions for recreating this behavior.
>>
>> I have ran these against the following version:
>>
>>    1. 1.0.0.Beta2
>>    2. 1.0.0.RC1
>>    3. 1.0.0-SNAPSHOT as compiled from the git 1.0 branch on commit 
>>    0f8b41ffad9b5ecdfd543d7c73edcf404e6fc763 
>>
>> When ran on 1.0.0.Beta2 it gives the same output consistently when I run 
>> the _search over and over again.
>> When ran on 1.0.0.RC1 it will give me multiple different outcomes 
>> comparable to the numbers I posted earlier in the thread,
>> When ran on 1.0.0-SNAPSHOT it behaves the same as in 1.0.0.RC1.
>>
>> That it still was working on 1.0.0.Beta2 proves to me that it is a bug 
>> that got into RC1. I could not find any related ticket on the issues page 
>> of the github repository. Hopefully this is enough information to recreate 
>> the problem.
>>
>> The json file is quite big and could bug when you open the gist it in a 
>> browser. A clone of the gist locally will work best:
>> $ git clone https://gist.github.com/8803745.git
>>
>> I do not really know how to move on from here. Do you want me to open an 
>> issue for this problem at github.com/elasticsearch/elasticsearch? It 
>> would be nice to fix this problem before a release of 1.0.0 since that is 
>> the first release containing the aggregations for analytics.
>>
>> On Tuesday, February 4, 2014 12:31:10 PM UTC+1, Nils Dijk wrote:
>>
>>> I've loaded the same dataset in ES1.0.0.Beta2 with the same index 
>>> configuration as in the topic start.
>>>
>>> However now the numbers are consistent if I call the same aggregation 
>>> multiple times in a row AND the number match the numbers of the facets. 
>>> This leads me to the conclusion something is broken from Beta2 to RC1!
>>>
>>> I would like to test this on master, but I could not find any nightly 
>>> builds of elasticsearch. Is there a location where they are stored or 
>>> should I compile it myself?
>>>
>>> On Friday, January 31, 2014 6:43:07 PM UTC+1, Nils Dijk wrote:
>>>>
>>>> Hi Binh Ly,
>>>>
>>>> Thanks for the response.
>>>>
>>>> I'm aware that the numbers are not exact (hence the link to issue #1305 
>>>> in my initial post), and have been advocating slightly incorrect numbers 
>>>> with my colleges and customers for some time already to prepare them for 
>>>> the moment we provide analytics with ES. But what bothers me is that they 
>>>> are *inconsistent*.
>>>>
>>>> If you look at my gist you see that I ran the same aggs 3 times right 
>>>> after each other. If we just look at the top item we see the following 
>>>> results:
>>>>
>>>>    1. { "key": "totaltrafficbos", "doc_count": 2880 }
>>>>    2. { "key": "totaltrafficbos", "doc_count": 2552 }
>>>>    3. { "key": "totaltrafficbos", "doc_count": 2179 }
>>>>    
>>>> These results are taken within seconds without any change to the number of 
>>>> documents in the index. If I run them even more you see that it rotates 
>>>> between a hand full of numbers. Is this also behavior one would expect 
>>>> from the aggs? And if so, why do the facets show the same number over and 
>>>> over again?
>>>>
>>>> Anyway, I will try to work myself through the aggs code this weekend to 
>>>> get a better hang of what we could do with it, and what not.
>>>>
>>>> -- Nils
>>>>
>>>> On Friday, January 31, 2014 6:18:43 PM UTC+1, Binh Ly wrote:
>>>>>
>>>>> Nils,
>>>>>
>>>>> This is just the nature of splitting data around in shards. Actually 
>>>>> the terms facet has the same limitations (i.e. it will also give 
>>>>> "approximate counts"). Neither the terms facet nor the terms aggregation 
>>>>> is 
>>>>> better or worse than the other - they are both approximations (using 
>>>>> different implementations). It is correct that if you put all your data 
>>>>> in 
>>>>> 1 shard, then all the counts are exact. If you need to shard, you can 
>>>>> increase the "shard_size" parameter inside the terms aggregation to 
>>>>> "improve accuracy". Play with that number until it suits your purposes 
>>>>> but 
>>>>> the important thing is they are just approximations the more documents 
>>>>> you 
>>>>> have in the index - so just don't expect absolute numbers from them if 
>>>>> you 
>>>>> have more than 1 shard.
>>>>>
>>>>> {
>>>>>   "size": 0,
>>>>>   "aggs": {
>>>>>     "a": {
>>>>>       "terms": {
>>>>>         "field": "actor.displayName",
>>>>>         "shard_size": 10000
>>>>>       }
>>>>>     }
>>>>>   }
>>>>> }
>>>>>
>>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/fb421a29-8923-4188-9363-03682fec71ab%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b911b272-53c6-4bd2-9185-4f66dfeb0de0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to