[
https://issues.apache.org/jira/browse/HIVE-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
slim bouguerra reassigned HIVE-18226:
-------------------------------------
Assignee: slim bouguerra
> handle UDF to double/int over aggregate
> ---------------------------------------
>
> Key: HIVE-18226
> URL: https://issues.apache.org/jira/browse/HIVE-18226
> Project: Hive
> Issue Type: Sub-task
> Components: Druid integration
> Reporter: slim bouguerra
> Assignee: slim bouguerra
>
> In cases like the following query Hive planner adds extra UDFtoDouble over
> integer columns.
> This kind of udf can be pushed to Druid as DoubleSum instead of LongSum and
> vice versa.
> {code}
> PREHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(ctinyint)/ count(*)
> FROM druid_table GROUP BY floor_year(`__time`)
> PREHOOK: type: QUERY
> POSTHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(ctinyint)/ count(*)
> FROM druid_table GROUP BY floor_year(`__time`)
> POSTHOOK: type: QUERY
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Map Operator Tree:
> TableScan
> alias: druid_table
> properties:
> druid.query.json
> {"queryType":"timeseries","dataSource":"default.druid_table","descending":false,"granularity":"year","aggregations":[{"type":"longSum","name":"$f1","fieldName":"ctinyint"},{"type":"count","name":"$f2"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"],"context":{"skipEmptyBuckets":true}}
> druid.query.type timeseries
> Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL
> Column stats: NONE
> Select Operator
> expressions: __time (type: timestamp with local time zone),
> (UDFToDouble($f1) / UDFToDouble($f2)) (type: double)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL
> Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL
> Column stats: NONE
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)