[
https://issues.apache.org/jira/browse/DRILL-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063351#comment-17063351
]
ASF GitHub Bot commented on DRILL-7652:
---------------------------------------
cgivre commented on pull request #2033: DRILL-7652: Add time_bucket() function
for time series analysis
URL: https://github.com/apache/drill/pull/2033
# [DRILL-7652](https://issues.apache.org/jira/browse/DRILL-7652): Add
time_bucket() function for Time Series Analysis
## Description
This PR adds two UDFs which facilitate time series analysis. This PR also
includes updates to the `README.md` in the `contrib/udf` folder to reflect the
new UDF.
## Documentation
These functions are useful for doing time series analysis by grouping the
data into arbitrary intervals. See:
https://blog.timescale.com/blog/simplified-time-series-analytics
-using-the-time_bucket-function/ for more examples.
There are two versions of the function:
* `time_bucket(<timestamp>, <interval>)`
* `time_bucket_ns(<timestamp>,<interval>)`
Both functions accept a `BIGINT` timestamp and an interval in milliseconds
as arguments. The `time_bucket_ns()` function accepts timestamps in nanoseconds
and `time_bucket
()` accepts timestamps in milliseconds. Both return timestamps in the
original format.
### Example:
The query below calculates the average for the `cpu` metric for every five
minute interval.
```sql
SELECT time_bucket(time_stamp, 30000) AS five_min, avg(cpu)
FROM metrics
GROUP BY five_min
ORDER BY five_min DESC LIMIT 12;
```
## Testing
There are a series of unit tests included with this PR.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Add time_bucket() Function for Time Series Analysis
> ---------------------------------------------------
>
> Key: DRILL-7652
> URL: https://issues.apache.org/jira/browse/DRILL-7652
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.17.0
> Reporter: Charles Givre
> Priority: Major
> Fix For: 1.18.0
>
>
> These functions are useful for doing time series analysis by grouping the
> data into arbitrary intervals. See:
> https://blog.timescale.com/blog/simplified-time-series-analytics
> -using-the-time_bucket-function/ for more examples.
> There are two versions of the function:
> * `time_bucket(<timestamp>, <interval>)`
> * `time_bucket_ns(<timestamp>,<interval>)`
> Both functions accept a `BIGINT` timestamp and an interval in milliseconds as
> arguments. The `time_bucket_ns()` function accepts timestamps in nanoseconds
> and `time_bucket
> ()` accepts timestamps in milliseconds. Both return timestamps in the
> original format.
> ### Example:
> The query below calculates the average for the `cpu` metric for every five
> minute interval.
> ```sql
> SELECT time_bucket(time_stamp, 30000) AS five_min, avg(cpu)
> FROM metrics
> GROUP BY five_min
> ORDER BY five_min DESC LIMIT 12;
> ```
--
This message was sent by Atlassian Jira
(v8.3.4#803005)