Alex Behm has posted comments on this change. Change subject: IMPALA-2502: don't redundantly repartition grouping aggregations ......................................................................
Patch Set 4: (8 comments) http://gerrit.cloudera.org:8080/#/c/2414/4/fe/src/main/java/com/cloudera/impala/planner/DataPartition.java File fe/src/main/java/com/cloudera/impala/planner/DataPartition.java: Line 47: public DataPartition(TPartitionType type, List<Expr> exprs) { let's make this private now http://gerrit.cloudera.org:8080/#/c/2414/4/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java: Line 743: throws ImpalaException, InternalException { ImpalaException should be sufficient Line 756: boolean childPartitionEquivalent = ctx_.getRootAnalyzer().equivSets(partitionExprs, Would be nice to keep the var naming consistent within this file. For the join we have something like lhsHasCompatPartition. I'm fine with changing the var names in the join of the one here. Line 760: // the aggregation in the child fragment. .. in the child fragment without an extra merge step. Line 809: throws ImpalaException, InternalException { ImpalaException should be sufficient http://gerrit.cloudera.org:8080/#/c/2414/4/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test: Line 920: from tpch_parquet.customer inner join [shuffle] tpch_parquet.orders on c_custkey = o_custkey add a having clause and a limit to make sure those work correctly also add a new test where with such an aggregation inside an inline view, and the parent block has a limit + where clause that should be assigned to the aggregation (some of these cases are covered by more complicated test suggested below) Line 947: select straight_join count(distinct c_custkey) Add or modify an existing test where there are interesting expressions on the affected slots. Line 979: select straight_join c_custkey, count(distinct c_custkey) Add a new more complicated test that combined two joins and an aggregation in the same fragment, plus some inline view indirection. Something like: select id, bigint_col, count(*) from ( select straight_join t1.id id, t2.bigint_col as bigint_col inner join [shuffle] t2 on t1.id = t2.id and t1.int_col = t2.bigint_col inner join t3 on t2.id = t3.id and t2.bigint_col = t3.id ) v group by 1, 2 having count(*) > 10 limit 10 -- To view, visit http://gerrit.cloudera.org:8080/2414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iffdcfd3629b8a69bd23915e1adba3b8323cbbaef Gerrit-PatchSet: 4 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
