Hi Martijn, great news, thank you!
Would you recommend to keep parent-child data model and wait for a release? (Do you have a feeling of the date?). Thank you Vlad On Tuesday, October 21, 2014 4:01:47 PM UTC+2, Martijn v Groningen wrote: > > Hi Vlad, > > I reproduced it. The children agg doesn't take documents marked as deleted > into account properly. > > When documents are deleted they are initially marked as deleted before > they're removed from the index. This also applies to updates, because that > translate into an index + delete. > > The issue you're experiencing can also happen when not using the bulk api. > It may just be a bit less likely to manifest. > > The fix for this bug is small. I'll open a PR soon. > > Martijn > > On 21 October 2014 15:51, Vlad Vlaskin <[email protected] <javascript:>> > wrote: > >> Hi Martijn, >> >> Couple hours age I tried to submit a bug on ES Github issues and during >> creating steps of reproduce realized one more thing. >> >> *It happens only if you update the same child document within one bulk >> request.* >> >> Because I didn't manage to reproduce the "arithmetic progression" effect >> with curling my localhost, but it is still reproducible from java code >> doing bulk-update (script + upsert doc). >> I understand that bulk-updating the same document is a pretty ugly thing >> and I was surprised when it worked normally (without exceptions about >> version conflicts) from java client. >> >> If it might be helpful: these are the steps and queries to curl your >> localhost with parent-child. >> Unfortunately I don't know how to create a curl with bulk updates. >> >> >> #Create index "test" with parent-cild mappings >> >> curl -XPUT localhost:9200/test -d >> '{"mappings":{"root":{"properties":{"country":{"type":"string"}}},"metric":{"_parent":{"type":"root"},"properties":{"count":{"type":"long"}}}}}' >> >> #Index parent document: >> curl -XPUT localhost:9200/test/root/1 -d '{"country":"de"}' >> >> #Index child document: >> curl -XPUT 'http://localhost:9200/test/metric/1?parent=1' -d >> '{"count":1}' >> #Update child document: >> curl -XPOST 'http://localhost:9200/test/metric/1/_update?parent=1' -d >> '{"script":"ctx._source.count+=ct", "params":{"ct":1}}' >> #Query with benchmark query, it should return 2 >> curl -XGET localhost:9200/test/_search -d >> '{"size":0,"query":{"match_all":{}},"aggs":{"requests":{"sum":{"field":"count"}}}}' >> #Query with child aggregation query, exepected 2 >> curl -XGET localhost:9200/test/metric/_search -d >> '{"size":0,"query":{"match_all":{}},"aggs":{"child":{"children":{"type":"metric"},"aggs":{"requests":{"sum":{"field":"count"}}}}}}' >> >> >> >> Thank you >> >> On Tuesday, October 21, 2014 3:33:35 PM UTC+2, Martijn v Groningen wrote: >>> >>> Hi Vlad, >>> >>> What you're describing shouldn't happen. The child docs should get >>> detached. I think this is a bug. >>> Let me verify and get back to you. >>> >>> Martijn >>> >>> On 21 October 2014 13:26, Vlad Vlaskin <[email protected]> wrote: >>> >>>> After some experiments I believe I found the cause of the discrepancy >>>> problem: >>>> >>>> *ElasticSearch does not detach child object after it has been updated >>>> from parent child aggregation and uses it in child aggregation. * >>>> >>>> E.g. I have my child updated 4 times with script (within batch update), >>>> and it has 4 versions: >>>> { "count": 1}, { "count": 2}, { "count": 3}, { "count": 4} >>>> >>>> Query to the child document (after refresh) shows you proper version: >>>> {"count": 4} >>>> >>>> But child aggregation {"sum":{"field":"count"}} shows you 10, because: >>>> >>>> 1 + 2 +3 +4 = 10 >>>> >>>> It works pretty accurate (e.g. for 5 you have 15). >>>> >>>> It explains the behavior here. >>>> >>>> >>>> >>>> >>>> >>>> On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote: >>>>> >>>>> Dear ES group, >>>>> we've been using ES in production for a while and test eagerly all >>>>> new-coming features such as cardinality and others. >>>>> >>>>> We try data modeling with parent-child relations (ES version >>>>> 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.) >>>>> With data model of: >>>>> *Parent* >>>>> { >>>>> "key": "value" >>>>> } >>>>> >>>>> and a timeline with children, holding metrics: >>>>> >>>>> *Child* (type "metrics") >>>>> { >>>>> "day": "2014-10-20", >>>>> "count: 10 >>>>> } >>>>> >>>>> We update metric documents and properly index them with script+upsert. >>>>> The problem is that the query below* yields in 2 different results in >>>>> round robin way. * >>>>> E.g. first time you call it you receive the first number, a second >>>>> after you receive the second and again back to the first, etc. >>>>> >>>>> { >>>>> "size": 0, >>>>> "query": { >>>>> "match_all": {} >>>>> }, >>>>> "aggs": { >>>>> "MY_FIELD": { >>>>> "terms": { >>>>> "field": "FIELD-XYZ" // parent term >>>>> aggregation >>>>> }, >>>>> "aggs": { >>>>> "children": { >>>>> "children": { >>>>> "type": "metrics" // child aggregation >>>>> of type "metrics" >>>>> }, >>>>> "aggs": { >>>>> "requests": { >>>>> "sum": { >>>>> "field": "count" // target aggregation >>>>> within child documents >>>>> } >>>>> } >>>>> } >>>>> } >>>>> } >>>>> } >>>>> } >>>>> } >>>>> >>>>> Result A: >>>>> "aggregations": { >>>>> "MY_FIELD": { >>>>> "doc_count_error_upper_bound": 0, >>>>> "buckets": [ >>>>> { >>>>> "key": "xx", >>>>> "doc_count": 283322, >>>>> "children": { >>>>> "doc_count": 3740372, >>>>> "requests": { >>>>> "value": *5801652297* >>>>> } >>>>> } >>>>> } >>>>> ] >>>>> } >>>>> } >>>>> >>>>> Result B: >>>>> "aggregations": { >>>>> "MY_FIELD": { >>>>> "doc_count_error_upper_bound": 0, >>>>> "buckets": [ >>>>> { >>>>> "key": "xx", >>>>> "doc_count": 302421, >>>>> "children": { >>>>> "doc_count": 1877361, >>>>> "requests": { >>>>> "value": *2965346170* >>>>> } >>>>> } >>>>> } >>>>> ] >>>>> } >>>>> } >>>>> >>>>> The problem is that switching A to B back and forth is pretty stable >>>>> and reproducible. >>>>> ES logs are clear. >>>>> >>>>> Could someone help towards some ideas here? >>>>> >>>>> Thank you! >>>>> >>>>> Vlad >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> >>> -- >>> Met vriendelijke groet, >>> >>> Martijn van Groningen >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Met vriendelijke groet, > > Martijn van Groningen > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/235d630d-5f0d-4c12-9f34-02e0f069497d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
