Hi Martijn, Would you help with another question considering this topic
I red that ES stores parent-child relations in a heap, could it be that this bug prevents some objects from being GC-ed, e.g. there is a memory leak? And what happens if there is no more heap but there are more parent-child relations incoming? The reason Im asking is that our cluster (8 rxlarge, etc etc) went down after 2 days updating paren-child relations. Index volume is tiny, but the number of child documents updated is huge. Thank you. Vlad On Tuesday, October 21, 2014 4:38:55 PM UTC+2, Martijn v Groningen wrote: > > Hi Vlad, > > I opened: https://github.com/elasticsearch/elasticsearch/pull/8180 > > Many thanks for reporting this issue! > Besides this bug the parent/child model works well, so I recommend to keep > it. I don't know exactly when the next 1.4 release is released, but I > expect within a week or 2. > > Martijn > > > On 21 October 2014 16:17, Vlad Vlaskin <[email protected] <javascript:>> > wrote: > >> Hi Martijn, >> >> great news, thank you! >> >> Would you recommend to keep parent-child data model and wait for a >> release? (Do you have a feeling of the date?). >> >> Thank you >> >> Vlad >> >> >> >> On Tuesday, October 21, 2014 4:01:47 PM UTC+2, Martijn v Groningen wrote: >>> >>> Hi Vlad, >>> >>> I reproduced it. The children agg doesn't take documents marked as >>> deleted into account properly. >>> >>> When documents are deleted they are initially marked as deleted before >>> they're removed from the index. This also applies to updates, because that >>> translate into an index + delete. >>> >>> The issue you're experiencing can also happen when not using the bulk >>> api. It may just be a bit less likely to manifest. >>> >>> The fix for this bug is small. I'll open a PR soon. >>> >>> Martijn >>> >>> On 21 October 2014 15:51, Vlad Vlaskin <[email protected]> wrote: >>> >>>> Hi Martijn, >>>> >>>> Couple hours age I tried to submit a bug on ES Github issues and during >>>> creating steps of reproduce realized one more thing. >>>> >>>> *It happens only if you update the same child document within one bulk >>>> request.* >>>> >>>> Because I didn't manage to reproduce the "arithmetic progression" >>>> effect with curling my localhost, but it is still reproducible from java >>>> code doing bulk-update (script + upsert doc). >>>> I understand that bulk-updating the same document is a pretty ugly >>>> thing >>>> and I was surprised when it worked normally (without exceptions about >>>> version conflicts) from java client. >>>> >>>> If it might be helpful: these are the steps and queries to curl your >>>> localhost with parent-child. >>>> Unfortunately I don't know how to create a curl with bulk updates. >>>> >>>> >>>> #Create index "test" with parent-cild mappings >>>> >>>> curl -XPUT localhost:9200/test -d '{"mappings":{"root":{" >>>> properties":{"country":{"type":"string"}}},"metric":{"_ >>>> parent":{"type":"root"},"properties":{"count":{"type":"long"}}}}}' >>>> >>>> #Index parent document: >>>> curl -XPUT localhost:9200/test/root/1 -d '{"country":"de"}' >>>> >>>> #Index child document: >>>> curl -XPUT 'http://localhost:9200/test/metric/1?parent=1' -d >>>> '{"count":1}' >>>> #Update child document: >>>> curl -XPOST 'http://localhost:9200/test/metric/1/_update?parent=1' -d >>>> '{"script":"ctx._source.count+=ct", "params":{"ct":1}}' >>>> #Query with benchmark query, it should return 2 >>>> curl -XGET localhost:9200/test/_search -d '{"size":0,"query":{"match_ >>>> all":{}},"aggs":{"requests":{"sum":{"field":"count"}}}}' >>>> #Query with child aggregation query, exepected 2 >>>> curl -XGET localhost:9200/test/metric/_search -d >>>> '{"size":0,"query":{"match_all":{}},"aggs":{"child":{" >>>> children":{"type":"metric"},"aggs":{"requests":{"sum":{" >>>> field":"count"}}}}}}' >>>> >>>> >>>> >>>> Thank you >>>> >>>> On Tuesday, October 21, 2014 3:33:35 PM UTC+2, Martijn v Groningen >>>> wrote: >>>>> >>>>> Hi Vlad, >>>>> >>>>> What you're describing shouldn't happen. The child docs should get >>>>> detached. I think this is a bug. >>>>> Let me verify and get back to you. >>>>> >>>>> Martijn >>>>> >>>>> On 21 October 2014 13:26, Vlad Vlaskin <[email protected]> wrote: >>>>> >>>>>> After some experiments I believe I found the cause of the discrepancy >>>>>> problem: >>>>>> >>>>>> *ElasticSearch does not detach child object after it has been updated >>>>>> from parent child aggregation and uses it in child aggregation. * >>>>>> >>>>>> E.g. I have my child updated 4 times with script (within batch >>>>>> update), and it has 4 versions: >>>>>> { "count": 1}, { "count": 2}, { "count": 3}, { "count": 4} >>>>>> >>>>>> Query to the child document (after refresh) shows you proper version: >>>>>> {"count": 4} >>>>>> >>>>>> But child aggregation {"sum":{"field":"count"}} shows you 10, because: >>>>>> >>>>>> 1 + 2 +3 +4 = 10 >>>>>> >>>>>> It works pretty accurate (e.g. for 5 you have 15). >>>>>> >>>>>> It explains the behavior here. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote: >>>>>>> >>>>>>> Dear ES group, >>>>>>> we've been using ES in production for a while and test eagerly all >>>>>>> new-coming features such as cardinality and others. >>>>>>> >>>>>>> We try data modeling with parent-child relations (ES version >>>>>>> 1.4.0.Beta1, 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.) >>>>>>> With data model of: >>>>>>> *Parent* >>>>>>> { >>>>>>> "key": "value" >>>>>>> } >>>>>>> >>>>>>> and a timeline with children, holding metrics: >>>>>>> >>>>>>> *Child* (type "metrics") >>>>>>> { >>>>>>> "day": "2014-10-20", >>>>>>> "count: 10 >>>>>>> } >>>>>>> >>>>>>> We update metric documents and properly index them with >>>>>>> script+upsert. >>>>>>> The problem is that the query below* yields in 2 different results >>>>>>> in round robin way. * >>>>>>> E.g. first time you call it you receive the first number, a second >>>>>>> after you receive the second and again back to the first, etc. >>>>>>> >>>>>>> { >>>>>>> "size": 0, >>>>>>> "query": { >>>>>>> "match_all": {} >>>>>>> }, >>>>>>> "aggs": { >>>>>>> "MY_FIELD": { >>>>>>> "terms": { >>>>>>> "field": "FIELD-XYZ" // parent term >>>>>>> aggregation >>>>>>> }, >>>>>>> "aggs": { >>>>>>> "children": { >>>>>>> "children": { >>>>>>> "type": "metrics" // child >>>>>>> aggregation of type "metrics" >>>>>>> }, >>>>>>> "aggs": { >>>>>>> "requests": { >>>>>>> "sum": { >>>>>>> "field": "count" // target >>>>>>> aggregation within child documents >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> Result A: >>>>>>> "aggregations": { >>>>>>> "MY_FIELD": { >>>>>>> "doc_count_error_upper_bound": 0, >>>>>>> "buckets": [ >>>>>>> { >>>>>>> "key": "xx", >>>>>>> "doc_count": 283322, >>>>>>> "children": { >>>>>>> "doc_count": 3740372, >>>>>>> "requests": { >>>>>>> "value": *5801652297* >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> ] >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> Result B: >>>>>>> "aggregations": { >>>>>>> "MY_FIELD": { >>>>>>> "doc_count_error_upper_bound": 0, >>>>>>> "buckets": [ >>>>>>> { >>>>>>> "key": "xx", >>>>>>> "doc_count": 302421, >>>>>>> "children": { >>>>>>> "doc_count": 1877361, >>>>>>> "requests": { >>>>>>> "value": *2965346170* >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> ] >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> The problem is that switching A to B back and forth is pretty stable >>>>>>> and reproducible. >>>>>>> ES logs are clear. >>>>>>> >>>>>>> Could someone help towards some ideas here? >>>>>>> >>>>>>> Thank you! >>>>>>> >>>>>>> Vlad >>>>>>> >>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "elasticsearch" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>>> msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40goo >>>>>> glegroups.com >>>>>> <https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Met vriendelijke groet, >>>>> >>>>> Martijn van Groningen >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/elasticsearch/ab610c8d-f85c-4967-aff1-7e79111fe71d%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> >>> -- >>> Met vriendelijke groet, >>> >>> Martijn van Groningen >>> >> > > > -- > Met vriendelijke groet, > > Martijn van Groningen > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/42a73156-f6fb-4e9d-b1da-2615710ea97d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
