Re: questions about aggregation min_doc_count = 0

John Stanford Tue, 18 Mar 2014 20:40:23 -0700

Thanks Matt, I suspected as much on #1.  I think it might save a little 
post-processing if it provided buckets for the specified range.  The issue 
appears to be logged as 
https://github.com/elasticsearch/elasticsearch/issues/5224 and a pull request 
has been made.   I tried the filter on #2, and it still picked up hosts that 
weren’t in that doc type, so I filed 
https://github.com/elasticsearch/elasticsearch/issues/5458.


Cheers,

John

[email protected]
@jxstanford



On Mar 18, 2014, at 13:17:10, Matt Weber <[email protected]> wrote:

> 1.  The histogram aggregation (and facet) work on indexed values not based on 
> the current time or "now".  So, if the last indexed document timestamp is 
> 3/15/14T16:15 you will not get empty buckets between 3/15/14T16:15 and the 
> current time.  It would be interesting to be able to set the "to" and "from" 
> on histogram based aggregations to allow for generating buckets on intervals 
> between the defined range.
> 
> 2.  I believe this is the way the keys are pulled from the fielddata which is 
> index level data.  So if you are using the "all" index you are going to get 
> data from all indices.  Not sure if this is a bug or not.  You can try 
> applying a filter aggregation:
> 
> POST _all/summary_phys/_search
> {
>   "aggs": {
>     "summary_phys_events": {
>       "filter": {
>         "type": {"value": "summary_phys_events"}
>       },
>       "aggs": {
>         "events_by_date": {
>           "date_histogram": {
>             "field": "@timestamp",
>             "interval": "300s",
>             "min_doc_count": 0
>           },
>           "aggs": {
>             "events_by_host": {
>               "terms": {
>                 "field": "host.raw",
>                 "min_doc_count": 0
>               },
>               "aggs": {
>                 "avg_used": {
>                   "avg": {
>                     "field": "used"
>                   }
>                 },
>                 "max_used": {
>                   "max": {
>                     "field": "used"
>                   }
>                 }
>               }
>             }
>           }
>         }
>       }
>     }
>   }
> }
> 
> 
> 
> 
> 
> On Tue, Mar 18, 2014 at 12:39 PM, John Stanford <[email protected]> wrote:
> Hi,
> 
> I'm trying to get a better understanding of aggregations, so here are a 
> couple of questions that came up recently.
> 
> Question 1:
> 
> I have some time based data that I am using aggregations to chart.  The data 
> may be sparsely populated, so I've been setting min_doc_count to 0 so I get 
> empty buckets back anyway.  I've noticed that it will fill in empty buckets 
> unless they are before or after the first record of the range.  
> 
> For example, if I use a query similar to the one below, and there are no 
> records after 3/15/14T16:15, the last aggregation record will be for 
> 3/15/14T16:15.  On the other hand, if there is a gap in between the start 
> time and 3/15/14T16:15, I will get a bucket with a 0 doc count (as expected). 
>  
> 
> POST _all/summary_phys/_search
> 
> {
>    "aggs": {
>       "events_by_date": {
>          "date_histogram": {
>             "field": "@timestamp",
>             "interval": "300s",
>             "min_doc_count": 0
>          },
>          "aggs": {
>             "events_by_host": {
>                "terms": {
>                   "field": "host.raw"
>                },
>                "aggs": {
>                   "avg_used": {
>                      "avg": {
>                         "field": "used"
>                      }
>                   },
>                   "max_used": {
>                      "max": {
>                         "field": "used"
>                      }
>                   }
>                }
>             }
>          }
>       }
>    }
> }
> 
> Not getting the 0 doc count buckets back at the front and back of the range 
> seems contrary to the documented purpose of min_doc_count.  Am I doing 
> something wrong?
> 
> Question 2:
> 
> 
> If I add a min_doc_count = 0 to the inner aggregation, but limit the search 
> to a specific doc type like:
> 
>                       doc type
>                            v
> POST _all/summary_phys/_search
> {
>    "aggs": {
>       "events_by_date": {
>          "date_histogram": {
>             "field": "@timestamp",
>             "interval": "300s",
>             "min_doc_count": 0
>          },
>          "aggs": {
>             "events_by_host": {
>                "terms": {
>                   "field": "host.raw",
>                   "min_doc_count": 0
>                },
>                "aggs": {
>                   "avg_used": {
>                      "avg": {
>                         "field": "used"
>                      }
>                   },
>                   "max_used": {
>                      "max": {
>                         "field": "used"
>                      }
>                   }
>                }
>             }
>          }
>       }
>    }
> }
> 
> I get buckets with entries matching hosts that do not show up in this doc 
> type.  For example, I have only 3 values for host in this doc type 
> [compute-4, compute-2, compute-3], but I will get buckets back with hosts 
> from other doc types like:
> 
> "events_by_host": {
>                   "buckets": [
>                      {
>                         "key": "compute-4",
>                         "doc_count": 11,
>                         "max_used": {
>                            "value": 4608
>                         },
>                         "avg_used": {
>                            "value": 3677.090909090909
>                         }
>                      },
>                      {
>                         "key": "compute-2",
>                         "doc_count": 8,
>                         "max_used": {
>                            "value": 4608
>                         },
>                         "avg_used": {
>                            "value": 2304
>                         }
>                      },
>                      {
>                         "key": "compute-3",
>                         "doc_count": 2,
>                         "max_used": {
>                            "value": 4608
>                         },
>                         "avg_used": {
>                            "value": 4608
>                         }
>                      },
>                      {
>                         "key": "10.10.11.22:49509",
>                         "doc_count": 0,
>                         "max_used": {
>                            "value": null
>                         },
>                         "avg_used": {
>                            "value": null
>                         }
>                      },
>                      {
>                         "key": "controller",
>                         "doc_count": 0,
>                         "max_used": {
>                            "value": null
>                         },
>                         "avg_used": {
>                            "value": null
>                         }
>                      },
>                      {
>                         "key": "object-1",
>                         "doc_count": 0,
>                         "max_used": {
>                            "value": null
>                         },
>                         "avg_used": {
>                            "value": null
>                         }
>                      }
>                   ]
>             }
> 
> Is there a way to ensure that the inner aggregation also only buckets things 
> matching the search doc type?
> 
> Thanks in advance...
> 
> John
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/856133dc-c4ae-4cfc-adab-39453671d76d%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "elasticsearch" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/elasticsearch/kz0eFP7nZMU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/CAJ3KEoD1S47%2Bdu4hU8wAugzJW4LnWgP4A2XhjARLBnP2hvStJA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

smime.p7s
Description: S/MIME cryptographic signature

Re: questions about aggregation min_doc_count = 0

Reply via email to