# [Feature] Moving Average query type ## A groupBy-wrapping query type, optimized for performing moving-average calculations. Note: The concept of moving averge is also known as rolling average or running average.
## Background: In general terms, a Moving Average is a calculation performed on top of a time series, in order to smooth out fluctuations. Below is an example of the smoothing effect of a moving average function:  A simple example would be a trailing seven-day average for page views. The value of each day in the result would be the average page views of the last 7 days for that day:  An additional theoretical background can be found on the subject’s Wikipedia page (titled [Moving Average](https://en.wikipedia.org/wiki/Moving_average)). ## Problem: Currently, In order to compute a moving average with Druid, one would need to union multiple Timeseries/GroupBy queries, one per day of the result. In addition to being a cumbersome solution, that approach is also less efficient, as it requires multiple passes per row. ## Solution: We propose a new query type, **movingAverage**, which wraps [groupBy query](http://druid.io/docs/latest/querying/groupbyquery.html) (Or [timeseries query](http://druid.io/docs/latest/querying/timeseriesquery.html) when there are no dimensions). At high level, the movingAverage is doing the following: 1. Run an inner query (of type groupBy or timeseries) to get initial daily aggregations. 2. Computes the moving-average function based on inner query results. 3. Return combined records with both the simple aggregation and the moving average. This allows the query to avoid mulitple segment passes and aggregations per granularity period. In order to allow a flexible definition of the moving average function, movingAverage query introduces a new interface called **Averager**. The averager is somewhat similar to the **Aggregator**, but while the Aggregator's input is a metric from the datasource, the Averager's input as an Aggregator from the query. ## Example: This example is based on the `wikipedia` datasource available via the [tutorial examples package](http://druid.io/docs/latest/tutorials/tutorial-examples.tar.gz). The supplied datasource has only a single-day worth of data, so we will use 30-minute periods instead of the usual daily period. _Note: I have chosen a granularity period of 30 minutes in order to have enough data points. In reality, the current implementation doesn't fully support sub-daily granularity, but should require only minor changes to accomodate such an enhancement._ Let's use the `delta` metric in the `wikipedia` datasource. Say we want to compute the 7-period mean average over 30-minute periods of `delta`. We will define both an aggregator and an averager for this task using the movingAverage query syntax: ```json { "queryType": "movingAverage", "dataSource": "wikipedia", "granularity": { "type": "period", "period": "PT30M" }, "intervals": [ "2015-09-12T00:00:00Z/2015-09-13T00:00:00Z" ], "aggregations": [ { "name": "delta30Min", "fieldName": "delta", "type": "longSum" } ], "averagers": [ { "name": "trailing30MinChanges", "fieldName": "delta30Min", "type": "longMean", "buckets": 7 } ] } ``` Note that this syntax is derived from groupBy, with adding the **averages** JSON Object: **name**: Output name. **fieldName**: Input (aggregator) name. **type**: Formula type (longMean/doubleMean/doubleMax/etc. Full list will be included in the documentation). **bucket**: Number of buckets to look back. The result is inherited from the groupBy formtat: ```json [ { "version" : "v1", "timestamp" : "2015-09-12T00:30:00.000Z", "event" : { "delta30Min" : 30490, "trailing30MinChanges" : 4355.714285714285 } }, { "version" : "v1", "timestamp" : "2015-09-12T01:00:00.000Z", "event" : { "delta30Min" : 96526, "trailing30MinChanges" : 18145.14285714286 } }, { "version" : "v1", "timestamp" : "2015-09-12T01:30:00.000Z", "event" : { "delta30Min" : 87887, "trailing30MinChanges" : 30700.428571428572 } }, { "version" : "v1", "timestamp" : "2015-09-12T02:00:00.000Z", "event" : { "delta30Min" : 254632, "trailing30MinChanges" : 67076.42857142857 } } ] ``` A graph of the result will show a smoothing effect:  There are a few more advanced aspects to the implementation and usage of movingAverage. Those will be avaiable in the pull request (via the code and the decumentation). [ Full content available at: https://github.com/apache/incubator-druid/issues/6320 ] This message was relayed via gitbox.apache.org for [email protected]
