Hi Vlad,

I opened: https://github.com/elasticsearch/elasticsearch/pull/8180

Many thanks for reporting this issue!
Besides this bug the parent/child model works well, so I recommend to keep
it. I don't know exactly when the next 1.4 release is released, but I
expect within a week or 2.

Martijn


On 21 October 2014 16:17, Vlad Vlaskin <[email protected]> wrote:

> Hi Martijn,
>
> great news, thank you!
>
> Would you recommend to keep parent-child data model and wait for a
> release?  (Do you have a feeling of the date?).
>
> Thank you
>
> Vlad
>
>
>
> On Tuesday, October 21, 2014 4:01:47 PM UTC+2, Martijn v Groningen wrote:
>>
>> Hi Vlad,
>>
>> I reproduced it. The children agg doesn't take documents marked as
>> deleted into account properly.
>>
>> When documents are deleted they are initially marked as deleted before
>> they're removed from the index. This also applies to updates, because that
>> translate into an index + delete.
>>
>> The issue you're experiencing can also happen when not using the bulk
>> api. It may just be a bit less likely to manifest.
>>
>> The fix for this bug is small. I'll open a PR soon.
>>
>> Martijn
>>
>> On 21 October 2014 15:51, Vlad Vlaskin <[email protected]> wrote:
>>
>>> Hi Martijn,
>>>
>>> Couple hours age I tried to submit a bug on ES Github issues and during
>>> creating steps of reproduce realized one more thing.
>>>
>>> *It happens only if you update the same child document within one bulk
>>> request.*
>>>
>>> Because I didn't manage to reproduce the "arithmetic progression" effect
>>> with curling my localhost, but it is still reproducible from java code
>>> doing bulk-update (script + upsert doc).
>>> I understand that bulk-updating the same document is a pretty ugly thing
>>> and I was surprised when it worked normally (without exceptions about
>>> version conflicts) from java client.
>>>
>>> If it might be helpful: these are the steps and queries to curl your
>>> localhost with parent-child.
>>> Unfortunately I don't know how to create a curl with bulk updates.
>>>
>>>
>>>      #Create index "test" with parent-cild mappings
>>>
>>>  curl -XPUT localhost:9200/test -d '{"mappings":{"root":{"
>>> properties":{"country":{"type":"string"}}},"metric":{"_
>>> parent":{"type":"root"},"properties":{"count":{"type":"long"}}}}}'
>>>
>>> #Index parent document:
>>> curl -XPUT localhost:9200/test/root/1 -d '{"country":"de"}'
>>>
>>> #Index child document:
>>> curl -XPUT 'http://localhost:9200/test/metric/1?parent=1' -d
>>> '{"count":1}'
>>>  #Update child document:
>>> curl -XPOST 'http://localhost:9200/test/metric/1/_update?parent=1' -d
>>> '{"script":"ctx._source.count+=ct", "params":{"ct":1}}'
>>> #Query with benchmark query, it should return 2
>>> curl -XGET localhost:9200/test/_search -d '{"size":0,"query":{"match_
>>> all":{}},"aggs":{"requests":{"sum":{"field":"count"}}}}'
>>> #Query with child aggregation query, exepected 2
>>>  curl -XGET localhost:9200/test/metric/_search -d
>>> '{"size":0,"query":{"match_all":{}},"aggs":{"child":{"
>>> children":{"type":"metric"},"aggs":{"requests":{"sum":{"
>>> field":"count"}}}}}}'
>>>
>>>
>>>
>>> Thank you
>>>
>>> On Tuesday, October 21, 2014 3:33:35 PM UTC+2, Martijn v Groningen wrote:
>>>>
>>>> Hi Vlad,
>>>>
>>>> What you're describing shouldn't happen. The child docs should get
>>>> detached. I think this is a bug.
>>>> Let me verify and get back to you.
>>>>
>>>> Martijn
>>>>
>>>> On 21 October 2014 13:26, Vlad Vlaskin <[email protected]> wrote:
>>>>
>>>>> After some experiments I believe I found the cause of the discrepancy
>>>>> problem:
>>>>>
>>>>> *ElasticSearch does not detach child object after it has been updated
>>>>> from parent child aggregation and uses it in child aggregation. *
>>>>>
>>>>> E.g. I have my child updated 4 times with script (within batch
>>>>> update), and it has 4 versions:
>>>>> { "count": 1}, { "count": 2}, { "count": 3}, { "count": 4}
>>>>>
>>>>> Query to the child document (after refresh) shows you proper version:
>>>>> {"count": 4}
>>>>>
>>>>> But child aggregation {"sum":{"field":"count"}} shows you 10, because:
>>>>>
>>>>> 1 + 2 +3 +4 = 10
>>>>>
>>>>> It works pretty accurate (e.g. for 5 you have 15).
>>>>>
>>>>> It explains the behavior here.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote:
>>>>>>
>>>>>> Dear ES group,
>>>>>> we've been using ES in production for a while and test eagerly all
>>>>>> new-coming features such as cardinality and others.
>>>>>>
>>>>>> We try data modeling with parent-child relations (ES version
>>>>>> 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.)
>>>>>> With data model of:
>>>>>> *Parent*
>>>>>> {
>>>>>>   "key": "value"
>>>>>> }
>>>>>>
>>>>>> and a timeline with children, holding metrics:
>>>>>>
>>>>>> *Child* (type "metrics")
>>>>>> {
>>>>>>  "day": "2014-10-20",
>>>>>>   "count: 10
>>>>>> }
>>>>>>
>>>>>> We update metric documents and properly index them with script+upsert.
>>>>>> The problem is that the query below* yields in 2 different results
>>>>>> in round robin way. *
>>>>>> E.g. first time you call it you receive the first number, a second
>>>>>> after you receive the second and again back to the first, etc.
>>>>>>
>>>>>> {
>>>>>>     "size": 0,
>>>>>>     "query": {
>>>>>>         "match_all": {}
>>>>>>     },
>>>>>>     "aggs": {
>>>>>>         "MY_FIELD": {
>>>>>>             "terms": {
>>>>>>                 "field": "FIELD-XYZ"             // parent term
>>>>>> aggregation
>>>>>>             },
>>>>>>             "aggs": {
>>>>>>                 "children": {
>>>>>>                     "children": {
>>>>>>                         "type": "metrics"        // child aggregation
>>>>>> of type "metrics"
>>>>>>                     },
>>>>>>                     "aggs": {
>>>>>>                         "requests": {
>>>>>>                             "sum": {
>>>>>>                                 "field": "count" // target
>>>>>> aggregation within child documents
>>>>>>                             }
>>>>>>                         }
>>>>>>                     }
>>>>>>                 }
>>>>>>             }
>>>>>>         }
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>>  Result A:
>>>>>> "aggregations": {
>>>>>>       "MY_FIELD": {
>>>>>>          "doc_count_error_upper_bound": 0,
>>>>>>          "buckets": [
>>>>>>             {
>>>>>>                "key": "xx",
>>>>>>                "doc_count": 283322,
>>>>>>                "children": {
>>>>>>                   "doc_count": 3740372,
>>>>>>                   "requests": {
>>>>>>                      "value": *5801652297*
>>>>>>                   }
>>>>>>                }
>>>>>>             }
>>>>>>          ]
>>>>>>       }
>>>>>>    }
>>>>>>
>>>>>> Result B:
>>>>>> "aggregations": {
>>>>>>       "MY_FIELD": {
>>>>>>          "doc_count_error_upper_bound": 0,
>>>>>>          "buckets": [
>>>>>>             {
>>>>>>                "key": "xx",
>>>>>>                "doc_count": 302421,
>>>>>>                "children": {
>>>>>>                   "doc_count": 1877361,
>>>>>>                   "requests": {
>>>>>>                      "value": *2965346170*
>>>>>>                   }
>>>>>>                }
>>>>>>             }
>>>>>>          ]
>>>>>>       }
>>>>>>    }
>>>>>>
>>>>>> The problem is that switching A to B back and forth is pretty stable
>>>>>> and reproducible.
>>>>>> ES logs are clear.
>>>>>>
>>>>>> Could someone help towards some ideas here?
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>> Vlad
>>>>>>
>>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40goo
>>>>> glegroups.com
>>>>> <https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Met vriendelijke groet,
>>>>
>>>> Martijn van Groningen
>>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Met vriendelijke groet,
>>
>> Martijn van Groningen
>>
>


-- 
Met vriendelijke groet,

Martijn van Groningen

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2BA76Ty0L0tDjtOxcJO-VDp7FOtnoJgjqB-p7HMZ4Tz37%3DkPrw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to