slim bouguerra created HIVE-16026:
-------------------------------------
Summary: Generated query will timeout and/or kill the druid
cluster.
Key: HIVE-16026
URL: https://issues.apache.org/jira/browse/HIVE-16026
Project: Hive
Issue Type: Bug
Components: Druid integration
Reporter: slim bouguerra
Grouping by `__time` and another dimension generate a query with granularity
NONE with an interval from 1970 to 3000. This will kill the druid cluster
because druid group by strategy will create cursor for every ms and there is
lot of milliseconds between 1970 and 3000. Hence such query can turn into a
select then do the group by within hive. This should only happen when we don't
know the `__time` granularity.
{code}
explain select `__time`, userid from login_druid group by `__time`, userid
> ;
OK
Plan optimized by CBO.
Stage-0
Fetch Operator
limit:-1
Select Operator [SEL_1]
Output:["_col0","_col1"]
TableScan [TS_0]
Output:["__time","userid"],properties:{"druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_user_login\",\"granularity\":\"NONE\",\"dimensions\":[\"userid\"],\"limitSpec\":{\"type\":\"default\"},\"aggregations\":[{\"type\":\"longSum\",\"name\":\"dummy_agg\",\"fieldName\":\"dummy_agg\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)