[jira] [Created] (HIVE-19607) Pushing Aggregates on Top of Aggregates

slim bouguerra (JIRA) Fri, 18 May 2018 12:33:23 -0700

slim bouguerra created HIVE-19607:
-------------------------------------

             Summary: Pushing Aggregates on Top of Aggregates
                 Key: HIVE-19607
                 URL: https://issues.apache.org/jira/browse/HIVE-19607
             Project: Hive
          Issue Type: Sub-task
            Reporter: slim bouguerra
             Fix For: 3.1.0



This plan shows an instance where the count aggregates can be pushed to Druid 
which will eliminate the last stage reducer.

{code}
+PREHOOK: query: EXPLAIN select count(DISTINCT cstring2), sum(cdouble) FROM 
druid_table
+PREHOOK: type: QUERY
+POSTHOOK: query: EXPLAIN select count(DISTINCT cstring2), sum(cdouble) FROM 
druid_table
+POSTHOOK: type: QUERY
+STAGE DEPENDENCIES:
+  Stage-1 is a root stage
+  Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+  Stage: Stage-1
+    Tez
+#### A masked pattern was here ####
+      Edges:
+        Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
+#### A masked pattern was here ####
+      Vertices:
+        Map 1
+            Map Operator Tree:
+                TableScan
+                  alias: druid_table
+                  properties:
+                    druid.fieldNames cstring2,$f1
+                    druid.fieldTypes string,double
+                    druid.query.json 
{"queryType":"groupBy","dataSource":"default.druid_table","granularity":"all","dimensions":[{"type":"default","dimension":"cstring2","outputName":"cstring2","outputType":"STRING"}],"limitSpec":{"type":"default"},"aggregations":[{"type":"doubleSum","name":"$f1","fieldName":"cdouble"}],"intervals":["1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"]}
+                    druid.query.type groupBy
+                  Statistics: Num rows: 9173 Data size: 1673472 Basic stats: 
COMPLETE Column stats: NONE
+                  Select Operator
+                    expressions: cstring2 (type: string), $f1 (type: double)
+                    outputColumnNames: cstring2, $f1
+                    Statistics: Num rows: 9173 Data size: 1673472 Basic stats: 
COMPLETE Column stats: NONE
+                    Group By Operator
+                      aggregations: count(cstring2), sum($f1)
+                      mode: hash
+                      outputColumnNames: _col0, _col1
+                      Statistics: Num rows: 1 Data size: 208 Basic stats: 
COMPLETE Column stats: NONE
+                      Reduce Output Operator
+                        sort order:
+                        Statistics: Num rows: 1 Data size: 208 Basic stats: 
COMPLETE Column stats: NONE
+                        value expressions: _col0 (type: bigint), _col1 (type: 
double)
+        Reducer 2
+            Reduce Operator Tree:
+              Group By Operator
+                aggregations: count(VALUE._col0), sum(VALUE._col1)
+                mode: mergepartial
+                outputColumnNames: _col0, _col1
+                Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE 
Column stats: NONE
+                File Output Operator
+                  compressed: false
+                  Statistics: Num rows: 1 Data size: 208 Basic stats: COMPLETE 
Column stats: NONE
+                  table:
+                      input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
+                      output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
+                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19607) Pushing Aggregates on Top of Aggregates

Reply via email to