Hello,

I'm trying to get produce the distribution of documents that matches vs 
don't match a query, and get the cardinality of a field for both sets.  The 
idea is "Users who did" vs "Users who did not". In reality I'm actually 
running another aggregation under "did not" (otherwise I'd just subtract 
one count from the total), but the query here illustrates the issue I'm 
having:

*Query*

    "aggs": {
        "total_distinct_count": { "cardinality": { "field": "UserId" } },
        "has_thing": {
            "filter": { "term": { "State": "thing" } },
            "aggs": {
                "distinct_count": { "cardinality": { "field": "UserId" } }
            }
        },
        "does_not_have_thing": {
            "filter": { 
                "not" : { "term": { "State": "thing" } }
            },
            "aggs": { 
                "distinct_count": { "cardinality": { "field": "UserId" } }
            }    
        }
    }

*Response*

   "hits": {
      "total": 3309709,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "total_distinct_count": {
         "value": 654556
      },
      "does_not_have_thing": {
         "doc_count": 2575512,
         "distinct_count": {
            "value": 563371
         }
      },
      "has_thing": {
         "doc_count": 734197,
         "distinct_count": {
            "value": 223128
         }
      }
   }

I would expect (aggregations.has_thing.dictinct_count.value + 
aggregations.does_not_have_thing.distinct_count.value) to be close to 
aggreations.total_distinct_count.value, but in reality it's pretty far off 
(~+20%). Note: That the summation of doc_count adds up exactly to 
hits.total. So I don't think this is an issue with the query, but I could 
be wrong. 

Any ideas whats up? Have I structured the query incorrectly, Is this a bug? 
Or is this just expected behavior? 

Some notes:

   - UserId's data type is a *long, *but the values only fill up integer 
   space. (510,539 to 418,346,844) 
   - I'm running elasticsearch 1.1.0
   - I've tried playing around with the precision threshold, but it doesn't 
   appear to make a difference. 

Thanks in advance,
Cheers
Phil 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cb558261-7865-491e-9bc5-e3f78b6390f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to