Hi Martijn,

great news, thank you!

Would you recommend to keep parent-child data model and wait for a release? 
 (Do you have a feeling of the date?).

Thank you

Vlad



On Tuesday, October 21, 2014 4:01:47 PM UTC+2, Martijn v Groningen wrote:
>
> Hi Vlad, 
>
> I reproduced it. The children agg doesn't take documents marked as deleted 
> into account properly.
>
> When documents are deleted they are initially marked as deleted before 
> they're removed from the index. This also applies to updates, because that 
> translate into an index + delete. 
>
> The issue you're experiencing can also happen when not using the bulk api. 
> It may just be a bit less likely to manifest.
>
> The fix for this bug is small. I'll open a PR soon.
>
> Martijn
>
> On 21 October 2014 15:51, Vlad Vlaskin <[email protected] <javascript:>> 
> wrote:
>
>> Hi Martijn,
>>
>> Couple hours age I tried to submit a bug on ES Github issues and during 
>> creating steps of reproduce realized one more thing.
>>
>> *It happens only if you update the same child document within one bulk 
>> request.*
>>
>> Because I didn't manage to reproduce the "arithmetic progression" effect 
>> with curling my localhost, but it is still reproducible from java code 
>> doing bulk-update (script + upsert doc). 
>> I understand that bulk-updating the same document is a pretty ugly thing 
>> and I was surprised when it worked normally (without exceptions about 
>> version conflicts) from java client. 
>>
>> If it might be helpful: these are the steps and queries to curl your 
>> localhost with parent-child.
>> Unfortunately I don't know how to create a curl with bulk updates. 
>>
>>
>>      #Create index "test" with parent-cild mappings
>>
>>  curl -XPUT localhost:9200/test -d 
>> '{"mappings":{"root":{"properties":{"country":{"type":"string"}}},"metric":{"_parent":{"type":"root"},"properties":{"count":{"type":"long"}}}}}'
>>  
>> #Index parent document:
>> curl -XPUT localhost:9200/test/root/1 -d '{"country":"de"}'
>>
>> #Index child document:
>> curl -XPUT 'http://localhost:9200/test/metric/1?parent=1' -d 
>> '{"count":1}'
>>  #Update child document:
>> curl -XPOST 'http://localhost:9200/test/metric/1/_update?parent=1' -d 
>> '{"script":"ctx._source.count+=ct", "params":{"ct":1}}'
>> #Query with benchmark query, it should return 2
>> curl -XGET localhost:9200/test/_search -d 
>> '{"size":0,"query":{"match_all":{}},"aggs":{"requests":{"sum":{"field":"count"}}}}'
>> #Query with child aggregation query, exepected 2
>>  curl -XGET localhost:9200/test/metric/_search -d 
>> '{"size":0,"query":{"match_all":{}},"aggs":{"child":{"children":{"type":"metric"},"aggs":{"requests":{"sum":{"field":"count"}}}}}}'
>>
>>
>>
>> Thank you
>>
>> On Tuesday, October 21, 2014 3:33:35 PM UTC+2, Martijn v Groningen wrote:
>>>
>>> Hi Vlad,
>>>
>>> What you're describing shouldn't happen. The child docs should get 
>>> detached. I think this is a bug.
>>> Let me verify and get back to you.
>>>
>>> Martijn
>>>
>>> On 21 October 2014 13:26, Vlad Vlaskin <[email protected]> wrote:
>>>
>>>> After some experiments I believe I found the cause of the discrepancy 
>>>> problem:
>>>>
>>>> *ElasticSearch does not detach child object after it has been updated 
>>>> from parent child aggregation and uses it in child aggregation. *
>>>>
>>>> E.g. I have my child updated 4 times with script (within batch update), 
>>>> and it has 4 versions:
>>>> { "count": 1}, { "count": 2}, { "count": 3}, { "count": 4}
>>>>
>>>> Query to the child document (after refresh) shows you proper version: 
>>>> {"count": 4}
>>>>
>>>> But child aggregation {"sum":{"field":"count"}} shows you 10, because:
>>>>
>>>> 1 + 2 +3 +4 = 10
>>>>
>>>> It works pretty accurate (e.g. for 5 you have 15). 
>>>>
>>>> It explains the behavior here.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote:
>>>>>
>>>>> Dear ES group,
>>>>> we've been using ES in production for a while and test eagerly all 
>>>>> new-coming features such as cardinality and others.
>>>>>
>>>>> We try data modeling with parent-child relations (ES version 
>>>>> 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.)
>>>>> With data model of: 
>>>>> *Parent*
>>>>> {
>>>>>   "key": "value"  
>>>>> }
>>>>>
>>>>> and a timeline with children, holding metrics:
>>>>>
>>>>> *Child* (type "metrics")
>>>>> {
>>>>>  "day": "2014-10-20",
>>>>>   "count: 10
>>>>> }
>>>>>
>>>>> We update metric documents and properly index them with script+upsert.
>>>>> The problem is that the query below* yields in 2 different results in 
>>>>> round robin way. *
>>>>> E.g. first time you call it you receive the first number, a second 
>>>>> after you receive the second and again back to the first, etc. 
>>>>>
>>>>> {
>>>>>     "size": 0,
>>>>>     "query": {
>>>>>         "match_all": {}
>>>>>     },
>>>>>     "aggs": {
>>>>>         "MY_FIELD": {
>>>>>             "terms": {
>>>>>                 "field": "FIELD-XYZ"             // parent term 
>>>>> aggregation 
>>>>>             },
>>>>>             "aggs": {
>>>>>                 "children": {
>>>>>                     "children": {
>>>>>                         "type": "metrics"        // child aggregation 
>>>>> of type "metrics"
>>>>>                     },
>>>>>                     "aggs": {
>>>>>                         "requests": {
>>>>>                             "sum": {
>>>>>                                 "field": "count" // target aggregation 
>>>>> within child documents
>>>>>                             } 
>>>>>                         }
>>>>>                     }
>>>>>                 }
>>>>>             }
>>>>>         }
>>>>>     }
>>>>> }
>>>>>
>>>>>  Result A: 
>>>>> "aggregations": {
>>>>>       "MY_FIELD": {
>>>>>          "doc_count_error_upper_bound": 0,
>>>>>          "buckets": [
>>>>>             {
>>>>>                "key": "xx",
>>>>>                "doc_count": 283322,
>>>>>                "children": {
>>>>>                   "doc_count": 3740372,
>>>>>                   "requests": {
>>>>>                      "value": *5801652297*
>>>>>                   }
>>>>>                }
>>>>>             }
>>>>>          ]
>>>>>       }
>>>>>    }
>>>>>
>>>>> Result B:
>>>>> "aggregations": {
>>>>>       "MY_FIELD": {
>>>>>          "doc_count_error_upper_bound": 0,
>>>>>          "buckets": [
>>>>>             {
>>>>>                "key": "xx",
>>>>>                "doc_count": 302421,
>>>>>                "children": {
>>>>>                   "doc_count": 1877361,
>>>>>                   "requests": {
>>>>>                      "value": *2965346170*
>>>>>                   }
>>>>>                }
>>>>>             }
>>>>>          ]
>>>>>       }
>>>>>    }
>>>>>
>>>>> The problem is that switching A to B back and forth is pretty stable 
>>>>> and reproducible. 
>>>>> ES logs are clear. 
>>>>>
>>>>> Could someone help towards some ideas here?
>>>>>
>>>>> Thank you!
>>>>>
>>>>> Vlad
>>>>>
>>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%
>>>> 40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>>
>>> -- 
>>> Met vriendelijke groet,
>>>
>>> Martijn van Groningen 
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Met vriendelijke groet,
>
> Martijn van Groningen 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/235d630d-5f0d-4c12-9f34-02e0f069497d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to