Hi Vlad, I reproduced it. The children agg doesn't take documents marked as deleted into account properly.
When documents are deleted they are initially marked as deleted before they're removed from the index. This also applies to updates, because that translate into an index + delete. The issue you're experiencing can also happen when not using the bulk api. It may just be a bit less likely to manifest. The fix for this bug is small. I'll open a PR soon. Martijn On 21 October 2014 15:51, Vlad Vlaskin <[email protected]> wrote: > Hi Martijn, > > Couple hours age I tried to submit a bug on ES Github issues and during > creating steps of reproduce realized one more thing. > > *It happens only if you update the same child document within one bulk > request.* > > Because I didn't manage to reproduce the "arithmetic progression" effect > with curling my localhost, but it is still reproducible from java code > doing bulk-update (script + upsert doc). > I understand that bulk-updating the same document is a pretty ugly thing > and I was surprised when it worked normally (without exceptions about > version conflicts) from java client. > > If it might be helpful: these are the steps and queries to curl your > localhost with parent-child. > Unfortunately I don't know how to create a curl with bulk updates. > > > #Create index "test" with parent-cild mappings > > curl -XPUT localhost:9200/test -d > '{"mappings":{"root":{"properties":{"country":{"type":"string"}}},"metric":{"_parent":{"type":"root"},"properties":{"count":{"type":"long"}}}}}' > > #Index parent document: > curl -XPUT localhost:9200/test/root/1 -d '{"country":"de"}' > > #Index child document: > curl -XPUT 'http://localhost:9200/test/metric/1?parent=1' -d '{"count":1}' > #Update child document: > curl -XPOST 'http://localhost:9200/test/metric/1/_update?parent=1' -d > '{"script":"ctx._source.count+=ct", "params":{"ct":1}}' > #Query with benchmark query, it should return 2 > curl -XGET localhost:9200/test/_search -d > '{"size":0,"query":{"match_all":{}},"aggs":{"requests":{"sum":{"field":"count"}}}}' > #Query with child aggregation query, exepected 2 > curl -XGET localhost:9200/test/metric/_search -d > '{"size":0,"query":{"match_all":{}},"aggs":{"child":{"children":{"type":"metric"},"aggs":{"requests":{"sum":{"field":"count"}}}}}}' > > > > Thank you > > On Tuesday, October 21, 2014 3:33:35 PM UTC+2, Martijn v Groningen wrote: >> >> Hi Vlad, >> >> What you're describing shouldn't happen. The child docs should get >> detached. I think this is a bug. >> Let me verify and get back to you. >> >> Martijn >> >> On 21 October 2014 13:26, Vlad Vlaskin <[email protected]> wrote: >> >>> After some experiments I believe I found the cause of the discrepancy >>> problem: >>> >>> *ElasticSearch does not detach child object after it has been updated >>> from parent child aggregation and uses it in child aggregation. * >>> >>> E.g. I have my child updated 4 times with script (within batch update), >>> and it has 4 versions: >>> { "count": 1}, { "count": 2}, { "count": 3}, { "count": 4} >>> >>> Query to the child document (after refresh) shows you proper version: >>> {"count": 4} >>> >>> But child aggregation {"sum":{"field":"count"}} shows you 10, because: >>> >>> 1 + 2 +3 +4 = 10 >>> >>> It works pretty accurate (e.g. for 5 you have 15). >>> >>> It explains the behavior here. >>> >>> >>> >>> >>> >>> On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote: >>>> >>>> Dear ES group, >>>> we've been using ES in production for a while and test eagerly all >>>> new-coming features such as cardinality and others. >>>> >>>> We try data modeling with parent-child relations (ES version >>>> 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.) >>>> With data model of: >>>> *Parent* >>>> { >>>> "key": "value" >>>> } >>>> >>>> and a timeline with children, holding metrics: >>>> >>>> *Child* (type "metrics") >>>> { >>>> "day": "2014-10-20", >>>> "count: 10 >>>> } >>>> >>>> We update metric documents and properly index them with script+upsert. >>>> The problem is that the query below* yields in 2 different results in >>>> round robin way. * >>>> E.g. first time you call it you receive the first number, a second >>>> after you receive the second and again back to the first, etc. >>>> >>>> { >>>> "size": 0, >>>> "query": { >>>> "match_all": {} >>>> }, >>>> "aggs": { >>>> "MY_FIELD": { >>>> "terms": { >>>> "field": "FIELD-XYZ" // parent term >>>> aggregation >>>> }, >>>> "aggs": { >>>> "children": { >>>> "children": { >>>> "type": "metrics" // child aggregation >>>> of type "metrics" >>>> }, >>>> "aggs": { >>>> "requests": { >>>> "sum": { >>>> "field": "count" // target aggregation >>>> within child documents >>>> } >>>> } >>>> } >>>> } >>>> } >>>> } >>>> } >>>> } >>>> >>>> Result A: >>>> "aggregations": { >>>> "MY_FIELD": { >>>> "doc_count_error_upper_bound": 0, >>>> "buckets": [ >>>> { >>>> "key": "xx", >>>> "doc_count": 283322, >>>> "children": { >>>> "doc_count": 3740372, >>>> "requests": { >>>> "value": *5801652297* >>>> } >>>> } >>>> } >>>> ] >>>> } >>>> } >>>> >>>> Result B: >>>> "aggregations": { >>>> "MY_FIELD": { >>>> "doc_count_error_upper_bound": 0, >>>> "buckets": [ >>>> { >>>> "key": "xx", >>>> "doc_count": 302421, >>>> "children": { >>>> "doc_count": 1877361, >>>> "requests": { >>>> "value": *2965346170* >>>> } >>>> } >>>> } >>>> ] >>>> } >>>> } >>>> >>>> The problem is that switching A to B back and forth is pretty stable >>>> and reproducible. >>>> ES logs are clear. >>>> >>>> Could someone help towards some ideas here? >>>> >>>> Thank you! >>>> >>>> Vlad >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> >> -- >> Met vriendelijke groet, >> >> Martijn van Groningen >> > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- Met vriendelijke groet, Martijn van Groningen -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76Tx5jaTgUjZuXU%3D%2BZfjQ%2Br-Bxi5MOq%2BayOUbT5jWfa8trA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
