[
https://issues.apache.org/jira/browse/HIVE-16026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
slim bouguerra updated HIVE-16026:
----------------------------------
Fix Version/s: 3.0.0
> Generated query will timeout and/or kill the druid cluster.
> -----------------------------------------------------------
>
> Key: HIVE-16026
> URL: https://issues.apache.org/jira/browse/HIVE-16026
> Project: Hive
> Issue Type: Bug
> Components: Druid integration
> Reporter: slim bouguerra
> Priority: Major
> Fix For: 3.0.0
>
>
> Grouping by `__time` and another dimension generate a query with granularity
> NONE with an interval from 1970 to 3000. This will kill the druid cluster
> because druid group by strategy will create cursor for every ms and there is
> lot of milliseconds between 1970 and 3000. Hence such query can turn into a
> select then do the group by within hive. This should only happen when we
> don't know the `__time` granularity.
> {code}
> explain select `__time`, userid from login_druid group by `__time`, userid
> > ;
> OK
> Plan optimized by CBO.
> Stage-0
> Fetch Operator
> limit:-1
> Select Operator [SEL_1]
> Output:["_col0","_col1"]
> TableScan [TS_0]
>
> Output:["__time","userid"],properties:{"druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_user_login\",\"granularity\":\"NONE\",\"dimensions\":[\"userid\"],\"limitSpec\":{\"type\":\"default\"},\"aggregations\":[{\"type\":\"longSum\",\"name\":\"dummy_agg\",\"fieldName\":\"dummy_agg\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"}
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)