Hi,

I'm trying to get a better understanding of aggregations, so here are a 
couple of questions that came up recently.

Question 1:

I have some time based data that I am using aggregations to chart.  The 
data may be sparsely populated, so I've been setting min_doc_count to 0 so 
I get empty buckets back anyway.  I've noticed that it will fill in empty 
buckets unless they are before or after the first record of the range.  

For example, if I use a query similar to the one below, and there are no 
records after 3/15/14T16:15, the last aggregation record will be for 
3/15/14T16:15.  On the other hand, if there is a gap in between the start 
time and 3/15/14T16:15, I will get a bucket with a 0 doc count (as 
expected).  

POST _all/summary_phys/_search

{
   "aggs": {
      "events_by_date": {
         "date_histogram": {
            "field": "@timestamp",
            "interval": "300s",
            "min_doc_count": 0
         },
         "aggs": {
            "events_by_host": {
               "terms": {
                  "field": "host.raw"
               },
               "aggs": {
                  "avg_used": {
                     "avg": {
                        "field": "used"
                     }
                  },
                  "max_used": {
                     "max": {
                        "field": "used"
                     }
                  }
               }
            }
         }
      }
   }
}

Not getting the 0 doc count buckets back at the front and back of the range 
seems contrary to the documented purpose of min_doc_count.  Am I doing 
something wrong?

Question 2:


If I add a min_doc_count = 0 to the inner aggregation, but limit the search 
to a specific doc type like:

                      doc type
                           v
POST _all/summary_phys/_search
{
   "aggs": {
      "events_by_date": {
         "date_histogram": {
            "field": "@timestamp",
            "interval": "300s",
            "min_doc_count": 0
         },
         "aggs": {
            "events_by_host": {
               "terms": {
                  "field": "host.raw",
                  "min_doc_count": 0
               },
               "aggs": {
                  "avg_used": {
                     "avg": {
                        "field": "used"
                     }
                  },
                  "max_used": {
                     "max": {
                        "field": "used"
                     }
                  }
               }
            }
         }
      }
   }
}

I get buckets with entries matching hosts that do not show up in this doc 
type.  For example, I have only 3 values for host in this doc type 
[compute-4, compute-2, compute-3], but I will get buckets back with hosts 
from other doc types like:

"events_by_host": {
                  "buckets": [
                     {
                        "key": "compute-4",
                        "doc_count": 11,
                        "max_used": {
                           "value": 4608
                        },
                        "avg_used": {
                           "value": 3677.090909090909
                        }
                     },
                     {
                        "key": "compute-2",
                        "doc_count": 8,
                        "max_used": {
                           "value": 4608
                        },
                        "avg_used": {
                           "value": 2304
                        }
                     },
                     {
                        "key": "compute-3",
                        "doc_count": 2,
                        "max_used": {
                           "value": 4608
                        },
                        "avg_used": {
                           "value": 4608
                        }
                     },
                     {
                        "key": "10.10.11.22:49509",
                        "doc_count": 0,
                        "max_used": {
                           "value": null
                        },
                        "avg_used": {
                           "value": null
                        }
                     },
                     {
                        "key": "controller",
                        "doc_count": 0,
                        "max_used": {
                           "value": null
                        },
                        "avg_used": {
                           "value": null
                        }
                     },
                     {
                        "key": "object-1",
                        "doc_count": 0,
                        "max_used": {
                           "value": null
                        },
                        "avg_used": {
                           "value": null
                        }
                     }
                  ]
            }

Is there a way to ensure that the inner aggregation also only buckets 
things matching the search doc type?

Thanks in advance...

John

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/856133dc-c4ae-4cfc-adab-39453671d76d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to