After some experiments I believe I found the cause of the discrepancy
problem:
*ElasticSearch does not detach child object after it has been updated from
parent child aggregation and uses it in child aggregation. *
E.g. I have my child updated 4 times with script (within batch update), and
it has 4 versions:
{ "count": 1}, { "count": 2}, { "count": 3}, { "count": 4}
Query to the child document (after refresh) shows you proper version:
{"count": 4}
But child aggregation {"sum":{"field":"count"}} shows you 10, because:
1 + 2 +3 +4 = 10
It works pretty accurate (e.g. for 5 you have 15).
It explains the behavior here.
On Tuesday, October 21, 2014 3:18:47 AM UTC+2, Vlad Vlaskin wrote:
>
> Dear ES group,
> we've been using ES in production for a while and test eagerly all
> new-coming features such as cardinality and others.
>
> We try data modeling with parent-child relations (ES version 1.4.0.Beta1,
> 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.)
> With data model of:
> *Parent*
> {
> "key": "value"
> }
>
> and a timeline with children, holding metrics:
>
> *Child* (type "metrics")
> {
> "day": "2014-10-20",
> "count: 10
> }
>
> We update metric documents and properly index them with script+upsert.
> The problem is that the query below* yields in 2 different results in
> round robin way. *
> E.g. first time you call it you receive the first number, a second after
> you receive the second and again back to the first, etc.
>
> {
> "size": 0,
> "query": {
> "match_all": {}
> },
> "aggs": {
> "MY_FIELD": {
> "terms": {
> "field": "FIELD-XYZ" // parent term
> aggregation
> },
> "aggs": {
> "children": {
> "children": {
> "type": "metrics" // child aggregation of
> type "metrics"
> },
> "aggs": {
> "requests": {
> "sum": {
> "field": "count" // target aggregation
> within child documents
> }
> }
> }
> }
> }
> }
> }
> }
>
> Result A:
> "aggregations": {
> "MY_FIELD": {
> "doc_count_error_upper_bound": 0,
> "buckets": [
> {
> "key": "xx",
> "doc_count": 283322,
> "children": {
> "doc_count": 3740372,
> "requests": {
> "value": *5801652297*
> }
> }
> }
> ]
> }
> }
>
> Result B:
> "aggregations": {
> "MY_FIELD": {
> "doc_count_error_upper_bound": 0,
> "buckets": [
> {
> "key": "xx",
> "doc_count": 302421,
> "children": {
> "doc_count": 1877361,
> "requests": {
> "value": *2965346170*
> }
> }
> }
> ]
> }
> }
>
> The problem is that switching A to B back and forth is pretty stable
> and reproducible.
> ES logs are clear.
>
> Could someone help towards some ideas here?
>
> Thank you!
>
> Vlad
>
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2ce80724-b3d9-4d58-b54e-15727f999564%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.