Thanks Matt, I suspected as much on #1. I think it might save a little post-processing if it provided buckets for the specified range. The issue appears to be logged as https://github.com/elasticsearch/elasticsearch/issues/5224 and a pull request has been made. I tried the filter on #2, and it still picked up hosts that weren’t in that doc type, so I filed https://github.com/elasticsearch/elasticsearch/issues/5458.
Cheers, John [email protected] @jxstanford On Mar 18, 2014, at 13:17:10, Matt Weber <[email protected]> wrote: > 1. The histogram aggregation (and facet) work on indexed values not based on > the current time or "now". So, if the last indexed document timestamp is > 3/15/14T16:15 you will not get empty buckets between 3/15/14T16:15 and the > current time. It would be interesting to be able to set the "to" and "from" > on histogram based aggregations to allow for generating buckets on intervals > between the defined range. > > 2. I believe this is the way the keys are pulled from the fielddata which is > index level data. So if you are using the "all" index you are going to get > data from all indices. Not sure if this is a bug or not. You can try > applying a filter aggregation: > > POST _all/summary_phys/_search > { > "aggs": { > "summary_phys_events": { > "filter": { > "type": {"value": "summary_phys_events"} > }, > "aggs": { > "events_by_date": { > "date_histogram": { > "field": "@timestamp", > "interval": "300s", > "min_doc_count": 0 > }, > "aggs": { > "events_by_host": { > "terms": { > "field": "host.raw", > "min_doc_count": 0 > }, > "aggs": { > "avg_used": { > "avg": { > "field": "used" > } > }, > "max_used": { > "max": { > "field": "used" > } > } > } > } > } > } > } > } > } > } > > > > > > On Tue, Mar 18, 2014 at 12:39 PM, John Stanford <[email protected]> wrote: > Hi, > > I'm trying to get a better understanding of aggregations, so here are a > couple of questions that came up recently. > > Question 1: > > I have some time based data that I am using aggregations to chart. The data > may be sparsely populated, so I've been setting min_doc_count to 0 so I get > empty buckets back anyway. I've noticed that it will fill in empty buckets > unless they are before or after the first record of the range. > > For example, if I use a query similar to the one below, and there are no > records after 3/15/14T16:15, the last aggregation record will be for > 3/15/14T16:15. On the other hand, if there is a gap in between the start > time and 3/15/14T16:15, I will get a bucket with a 0 doc count (as expected). > > > POST _all/summary_phys/_search > > { > "aggs": { > "events_by_date": { > "date_histogram": { > "field": "@timestamp", > "interval": "300s", > "min_doc_count": 0 > }, > "aggs": { > "events_by_host": { > "terms": { > "field": "host.raw" > }, > "aggs": { > "avg_used": { > "avg": { > "field": "used" > } > }, > "max_used": { > "max": { > "field": "used" > } > } > } > } > } > } > } > } > > Not getting the 0 doc count buckets back at the front and back of the range > seems contrary to the documented purpose of min_doc_count. Am I doing > something wrong? > > Question 2: > > > If I add a min_doc_count = 0 to the inner aggregation, but limit the search > to a specific doc type like: > > doc type > v > POST _all/summary_phys/_search > { > "aggs": { > "events_by_date": { > "date_histogram": { > "field": "@timestamp", > "interval": "300s", > "min_doc_count": 0 > }, > "aggs": { > "events_by_host": { > "terms": { > "field": "host.raw", > "min_doc_count": 0 > }, > "aggs": { > "avg_used": { > "avg": { > "field": "used" > } > }, > "max_used": { > "max": { > "field": "used" > } > } > } > } > } > } > } > } > > I get buckets with entries matching hosts that do not show up in this doc > type. For example, I have only 3 values for host in this doc type > [compute-4, compute-2, compute-3], but I will get buckets back with hosts > from other doc types like: > > "events_by_host": { > "buckets": [ > { > "key": "compute-4", > "doc_count": 11, > "max_used": { > "value": 4608 > }, > "avg_used": { > "value": 3677.090909090909 > } > }, > { > "key": "compute-2", > "doc_count": 8, > "max_used": { > "value": 4608 > }, > "avg_used": { > "value": 2304 > } > }, > { > "key": "compute-3", > "doc_count": 2, > "max_used": { > "value": 4608 > }, > "avg_used": { > "value": 4608 > } > }, > { > "key": "10.10.11.22:49509", > "doc_count": 0, > "max_used": { > "value": null > }, > "avg_used": { > "value": null > } > }, > { > "key": "controller", > "doc_count": 0, > "max_used": { > "value": null > }, > "avg_used": { > "value": null > } > }, > { > "key": "object-1", > "doc_count": 0, > "max_used": { > "value": null > }, > "avg_used": { > "value": null > } > } > ] > } > > Is there a way to ensure that the inner aggregation also only buckets things > matching the search doc type? > > Thanks in advance... > > John > > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/856133dc-c4ae-4cfc-adab-39453671d76d%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to a topic in the Google > Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/kz0eFP7nZMU/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAJ3KEoD1S47%2Bdu4hU8wAugzJW4LnWgP4A2XhjARLBnP2hvStJA%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout.
smime.p7s
Description: S/MIME cryptographic signature
