Hi Vlad, I opened: https://github.com/elasticsearch/elasticsearch/pull/8180
Many thanks for reporting this issue! Besides this bug the parent/child model works well, so I recommend to keep it. I don't know exactly when the next 1.4 release is released, but I expect within a week or 2. Martijn On 21 October 2014 16:17, Vlad Vlaskin <[email protected]> wrote: > Hi Martijn, > > great news, thank you! > > Would you recommend to keep parent-child data model and wait for a > release? (Do you have a feeling of the date?). > > Thank you > > Vlad > > > > On Tuesday, October 21, 2014 4:01:47 PM UTC+2, Martijn v Groningen wrote: >> >> Hi Vlad, >> >> I reproduced it. The children agg doesn't take documents marked as >> deleted into account properly. >> >> When documents are deleted they are initially marked as deleted before >> they're removed from the index. This also applies to updates, because that >> translate into an index + delete. >> >> The issue you're experiencing can also happen when not using the bulk >> api. It may just be a bit less likely to manifest. >> >> The fix for this bug is small. I'll open a PR soon. >> >> Martijn >> >> On 21 October 2014 15:51, Vlad Vlaskin <[email protected]> wrote: >> >>> Hi Martijn, >>> >>> Couple hours age I tried to submit a bug on ES Github issues and during >>> creating steps of reproduce realized one more thing. >>> >>> *It happens only if you update the same child document within one bulk >>> request.* >>> >>> Because I didn't manage to reproduce the "arithmetic progression" effect >>> with curling my localhost, but it is still reproducible from java code >>> doing bulk-update (script + upsert doc). >>> I understand that bulk-updating the same document is a pretty ugly thing >>> and I was surprised when it worked normally (without exceptions about >>> version conflicts) from java client. >>> >>> If it might be helpful: these are the steps and queries to curl your >>> localhost with parent-child. >>> Unfortunately I don't know how to create a curl with bulk updates. >>> >>> >>> #Create index "test" with parent-cild mappings >>> >>> curl -XPUT localhost:9200/test -d '{"mappings":{"root":{" >>> properties":{"country":{"type":"string"}}},"metric":{"_ >>> parent":{"type":"root"},"properties":{"count":{"type":"long"}}}}}' >>> >>> #Index parent document: >>> curl -XPUT localhost:9200/test/root/1 -d '{"country":"de"}' >>> >>> #Index child document: >>> curl -XPUT 'http://localhost:9200/test/metric/1?parent=1' -d >>> '{"count":1}' >>> #Update child document: >>> curl -XPOST 'http://localhost:9200/test/metric/1/_update?parent=1' -d >>> '{"script":"ctx._source.count+=ct", "params":{"ct":1}}' >>> #Query with benchmark query, it should return 2 >>> curl -XGET localhost:9200/test/_search -d '{"size":0,"query":{"match_ >>> all":{}},"aggs":{"requests":{"sum":{"field":"count"}}}}' >>> #Query with child aggregation query, exepected 2 >>> curl -XGET localhost:9200/test/metric/_search -d >>> '{"size":0,"query":{"match_all":{}},"aggs":{"child":{" >>> children":{"type":"metric"},"aggs":{"requests":{"sum":{" >>> field":"count"}}}}}}' >>> >>> >>> >>> Thank you >>> >>> On Tuesday, October 21, 2014 3:33:35 PM UTC+2, Martijn v Groningen wrote: >>>> >>>> Hi Vlad, >>>> >>>> What you're describing shouldn't happen. The child docs should get >>>> detached. I think this is a bug. >>>> Let me verify and get back to you. >>>> >>>> Martijn >>>> >>>> On 21 October 2014 13:26, Vlad Vlaskin <[email protected]> wrote: >>>> >>>>> After some experiments I believe I found the cause of the discrepancy >>>>> problem: >>>>> >>>>> *ElasticSearch does not detach child object after it has been updated >>>>> from parent child aggregation and uses it in child aggregation. * >>>>> >>>>> E.g. I have my child updated 4 times with script (within batch >>>>> update), and it has 4 versions: >>>>> { "count": 1}, { "count": 2}, { "count": 3}, { "count": 4} >>>>> >>>>> Query to the child document (after refresh) shows you proper version: >>>>> {"count": 4} >>>>> >>>>> But child aggregation {"sum":{"field":"count"}} shows you 10, because: >>>>> >>>>> 1 + 2 +3 +4 = 10 >>>>> >>>>> It works pretty accurate (e.g. for 5 you have 15). >>>>> >>>>> It explains the behavior here. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote: >>>>>> >>>>>> Dear ES group, >>>>>> we've been using ES in production for a while and test eagerly all >>>>>> new-coming features such as cardinality and others. >>>>>> >>>>>> We try data modeling with parent-child relations (ES version >>>>>> 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.) >>>>>> With data model of: >>>>>> *Parent* >>>>>> { >>>>>> "key": "value" >>>>>> } >>>>>> >>>>>> and a timeline with children, holding metrics: >>>>>> >>>>>> *Child* (type "metrics") >>>>>> { >>>>>> "day": "2014-10-20", >>>>>> "count: 10 >>>>>> } >>>>>> >>>>>> We update metric documents and properly index them with script+upsert. >>>>>> The problem is that the query below* yields in 2 different results >>>>>> in round robin way. * >>>>>> E.g. first time you call it you receive the first number, a second >>>>>> after you receive the second and again back to the first, etc. >>>>>> >>>>>> { >>>>>> "size": 0, >>>>>> "query": { >>>>>> "match_all": {} >>>>>> }, >>>>>> "aggs": { >>>>>> "MY_FIELD": { >>>>>> "terms": { >>>>>> "field": "FIELD-XYZ" // parent term >>>>>> aggregation >>>>>> }, >>>>>> "aggs": { >>>>>> "children": { >>>>>> "children": { >>>>>> "type": "metrics" // child aggregation >>>>>> of type "metrics" >>>>>> }, >>>>>> "aggs": { >>>>>> "requests": { >>>>>> "sum": { >>>>>> "field": "count" // target >>>>>> aggregation within child documents >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> Result A: >>>>>> "aggregations": { >>>>>> "MY_FIELD": { >>>>>> "doc_count_error_upper_bound": 0, >>>>>> "buckets": [ >>>>>> { >>>>>> "key": "xx", >>>>>> "doc_count": 283322, >>>>>> "children": { >>>>>> "doc_count": 3740372, >>>>>> "requests": { >>>>>> "value": *5801652297* >>>>>> } >>>>>> } >>>>>> } >>>>>> ] >>>>>> } >>>>>> } >>>>>> >>>>>> Result B: >>>>>> "aggregations": { >>>>>> "MY_FIELD": { >>>>>> "doc_count_error_upper_bound": 0, >>>>>> "buckets": [ >>>>>> { >>>>>> "key": "xx", >>>>>> "doc_count": 302421, >>>>>> "children": { >>>>>> "doc_count": 1877361, >>>>>> "requests": { >>>>>> "value": *2965346170* >>>>>> } >>>>>> } >>>>>> } >>>>>> ] >>>>>> } >>>>>> } >>>>>> >>>>>> The problem is that switching A to B back and forth is pretty stable >>>>>> and reproducible. >>>>>> ES logs are clear. >>>>>> >>>>>> Could someone help towards some ideas here? >>>>>> >>>>>> Thank you! >>>>>> >>>>>> Vlad >>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40goo >>>>> glegroups.com >>>>> <https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>>> >>>> -- >>>> Met vriendelijke groet, >>>> >>>> Martijn van Groningen >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> >> -- >> Met vriendelijke groet, >> >> Martijn van Groningen >> > -- Met vriendelijke groet, Martijn van Groningen -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76Ty0L0tDjtOxcJO-VDp7FOtnoJgjqB-p7HMZ4Tz37%3DkPrw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
