After some experiments I believe I found the cause of the discrepancy 
problem:

*ElasticSearch does not detach child object after it has been updated from 
parent child aggregation and uses it in child aggregation. *

E.g. I have my child updated 4 times with script (within batch update), and 
it has 4 versions:
{ "count": 1}, { "count": 2}, { "count": 3}, { "count": 4}

Query to the child document (after refresh) shows you proper version: 
{"count": 4}

But child aggregation {"sum":{"field":"count"}} shows you 10, because:

1 + 2 +3 +4 = 10

It works pretty accurate (e.g. for 5 you have 15). 

It explains the behavior here.





On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote:
>
> Dear ES group,
> we've been using ES in production for a while and test eagerly all 
> new-coming features such as cardinality and others.
>
> We try data modeling with parent-child relations (ES version 1.4.0.Beta1, 
> 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.)
> With data model of: 
> *Parent*
> {
>   "key": "value"  
> }
>
> and a timeline with children, holding metrics:
>
> *Child* (type "metrics")
> {
>  "day": "2014-10-20",
>   "count: 10
> }
>
> We update metric documents and properly index them with script+upsert.
> The problem is that the query below* yields in 2 different results in 
> round robin way. *
> E.g. first time you call it you receive the first number, a second after 
> you receive the second and again back to the first, etc. 
>
> {
>     "size": 0,
>     "query": {
>         "match_all": {}
>     },
>     "aggs": {
>         "MY_FIELD": {
>             "terms": {
>                 "field": "FIELD-XYZ"             // parent term 
> aggregation 
>             },
>             "aggs": {
>                 "children": {
>                     "children": {
>                         "type": "metrics"        // child aggregation of 
> type "metrics"
>                     },
>                     "aggs": {
>                         "requests": {
>                             "sum": {
>                                 "field": "count" // target aggregation 
> within child documents
>                             } 
>                         }
>                     }
>                 }
>             }
>         }
>     }
> }
>
>  Result A: 
> "aggregations": {
>       "MY_FIELD": {
>          "doc_count_error_upper_bound": 0,
>          "buckets": [
>             {
>                "key": "xx",
>                "doc_count": 283322,
>                "children": {
>                   "doc_count": 3740372,
>                   "requests": {
>                      "value": *5801652297*
>                   }
>                }
>             }
>          ]
>       }
>    }
>
> Result B:
> "aggregations": {
>       "MY_FIELD": {
>          "doc_count_error_upper_bound": 0,
>          "buckets": [
>             {
>                "key": "xx",
>                "doc_count": 302421,
>                "children": {
>                   "doc_count": 1877361,
>                   "requests": {
>                      "value": *2965346170*
>                   }
>                }
>             }
>          ]
>       }
>    }
>
> The problem is that switching A to B back and forth is pretty stable 
> and reproducible. 
> ES logs are clear. 
>
> Could someone help towards some ideas here?
>
> Thank you!
>
> Vlad
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to