Hi Martin,

The bucket key for parent-term aggregation is the same. 


Maybe to make explanation simper, today I tried 2 queries:

*Query A: Sum of the field "count" in child documents directly.*

GET: INDEX-NAME/child/_search

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "requests": {
            "sum": {
                "field": "count"
            }
        }
    }
}


*Query B: Sum of the field "count" through parent documents.*
GET: INDEX-NAME/_search  ( we query all doc types here)

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "child": {
            "children": {
                "type": "child" 
            },
            "aggs": {
                "requests": {
                    "sum": {
                        "field": "count"
                    }
                }
            }
        }
    }
}

I expect these numbers to be about the same, but they are x times differs 
from each other:

Result from query A: 

 "hits": {
      "total": 4614829,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "requests": {
         "value": *53364274 // numbers make sense*
      }
   }


Result  from query B:

"hits": {
      "total": 4908110,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "child": {
         "doc_count": 13267677,
         "requests": {
            "value": *11208150231   // numbers does not make any sense*
         }
      }
   }

I just want to understand whether it is the feature problem (parent-child 
aggregation) or something wrong with data modeling.


Thank you  

On Tuesday, October 21, 2014 10:03:07 AM UTC+2, Martijn v Groningen wrote:
>
> Hi Vlad,
>
> I see that the doc_count is also different between the requests. Is the 
> actual bucket key also different between A and B?
>
> Martijn 
>
> On 21 October 2014 03:18, Vlad Vlaskin <[email protected] <javascript:>> 
> wrote:
>
>> Dear ES group,
>> we've been using ES in production for a while and test eagerly all 
>> new-coming features such as cardinality and others.
>>
>> We try data modeling with parent-child relations (ES version 1.4.0.Beta1, 
>> 8 nodes, EC2 r3.xlarge, ssd, lot ram etc.)
>> With data model of: 
>> *Parent*
>> {
>>   "key": "value"  
>> }
>>
>> and a timeline with children, holding metrics:
>>
>> *Child* (type "metrics")
>> {
>>  "day": "2014-10-20",
>>   "count: 10
>> }
>>
>> We update metric documents and properly index them with script+upsert.
>> The problem is that the query below* yields in 2 different results in 
>> round robin way. *
>> E.g. first time you call it you receive the first number, a second after 
>> you receive the second and again back to the first, etc. 
>>
>> {
>>     "size": 0,
>>     "query": {
>>         "match_all": {}
>>     },
>>     "aggs": {
>>         "MY_FIELD": {
>>             "terms": {
>>                 "field": "FIELD-XYZ"             // parent term 
>> aggregation 
>>             },
>>             "aggs": {
>>                 "children": {
>>                     "children": {
>>                         "type": "metrics"        // child aggregation of 
>> type "metrics"
>>                     },
>>                     "aggs": {
>>                         "requests": {
>>                             "sum": {
>>                                 "field": "count" // target aggregation 
>> within child documents
>>                             } 
>>                         }
>>                     }
>>                 }
>>             }
>>         }
>>     }
>> }
>>
>>  Result A: 
>> "aggregations": {
>>       "MY_FIELD": {
>>          "doc_count_error_upper_bound": 0,
>>          "buckets": [
>>             {
>>                "key": "xx",
>>                "doc_count": 283322,
>>                "children": {
>>                   "doc_count": 3740372,
>>                   "requests": {
>>                      "value": *5801652297*
>>                   }
>>                }
>>             }
>>          ]
>>       }
>>    }
>>
>> Result B:
>> "aggregations": {
>>       "MY_FIELD": {
>>          "doc_count_error_upper_bound": 0,
>>          "buckets": [
>>             {
>>                "key": "xx",
>>                "doc_count": 302421,
>>                "children": {
>>                   "doc_count": 1877361,
>>                   "requests": {
>>                      "value": *2965346170*
>>                   }
>>                }
>>             }
>>          ]
>>       }
>>    }
>>
>> The problem is that switching A to B back and forth is pretty stable 
>> and reproducible. 
>> ES logs are clear. 
>>
>> Could someone help towards some ideas here?
>>
>> Thank you!
>>
>> Vlad
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/6c948f61-0dce-4a62-b6ce-22b6a83aeaca%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/6c948f61-0dce-4a62-b6ce-22b6a83aeaca%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Met vriendelijke groet,
>
> Martijn van Groningen 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2280502d-1c92-4e2b-81f1-aa87c41d81ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to