slim bouguerra created HIVE-19607: ------------------------------------- Summary: Pushing Aggregates on Top of Aggregates Key: HIVE-19607 URL: https://issues.apache.org/jira/browse/HIVE-19607 Project: Hive Issue Type: Sub-task Reporter: slim bouguerra Fix For: 3.1.0
This plan shows an instance where the count aggregates can be pushed to Druid which will eliminate the last stage reducer. {code} +PREHOOK: query: EXPLAIN select count(DISTINCT cstring2), sum(cdouble) FROM druid_table +PREHOOK: type: QUERY +POSTHOOK: query: EXPLAIN select count(DISTINCT cstring2), sum(cdouble) FROM druid_table +POSTHOOK: type: QUERY +STAGE DEPENDENCIES: + Stage-1 is a root stage + Stage-0 depends on stages: Stage-1 + +STAGE PLANS: + Stage: Stage-1 + Tez +#### A masked pattern was here #### + Edges: + Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) +#### A masked pattern was here #### + Vertices: + Map 1 + Map Operator Tree: + TableScan + alias: druid_table + properties: + druid.fieldNames cstring2,$f1 + druid.fieldTypes string,double + druid.query.json {"queryType":"groupBy","dataSource":"default.druid_table","granularity":"all","dimensions":[{"type":"default","dimension":"cstring2","outputName":"cstring2","outputType":"STRING"}],"limitSpec":{"type":"default"},"aggregations":[{"type":"doubleSum","name":"$f1","fieldName":"cdouble"}],"intervals":["1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"]} + druid.query.type groupBy + Statistics: Num rows: 9173 Data size: 1673472 Basic stats: COMPLETE Column stats: NONE + Select Operator + expressions: cstring2 (type: string), $f1 (type: double) + outputColumnNames: cstring2, $f1 + Statistics: Num rows: 9173 Data size: 1673472 Basic stats: COMPLETE Column stats: NONE + Group By Operator + aggregations: count(cstring2), sum($f1) + mode: hash + outputColumnNames: _col0, _col1 + Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE + Reduce Output Operator + sort order: + Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE + value expressions: _col0 (type: bigint), _col1 (type: double) + Reducer 2 + Reduce Operator Tree: + Group By Operator + aggregations: count(VALUE._col0), sum(VALUE._col1) + mode: mergepartial + outputColumnNames: _col0, _col1 + Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE + File Output Operator + compressed: false + Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE Column stats: NONE + table: + input format: org.apache.hadoop.mapred.SequenceFileInputFormat + output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat + serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe + {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)