[Impala-CR](cdh5-trunk) IMPALA-2502: don't redundantly repartition grouping aggregations

Alex Behm (Code Review) Fri, 04 Mar 2016 15:51:07 -0800

Alex Behm has posted comments on this change.

Change subject: IMPALA-2502: don't redundantly repartition grouping aggregations
......................................................................



Patch Set 4:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/2414/4/fe/src/main/java/com/cloudera/impala/planner/DataPartition.java
File fe/src/main/java/com/cloudera/impala/planner/DataPartition.java:

Line 47:   public DataPartition(TPartitionType type, List<Expr> exprs) {
let's make this private now


http://gerrit.cloudera.org:8080/#/c/2414/4/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java
File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java:

Line 743:       throws ImpalaException, InternalException {
ImpalaException should be sufficient


Line 756:       boolean childPartitionEquivalent = 
ctx_.getRootAnalyzer().equivSets(partitionExprs,
Would be nice to keep the var naming consistent within this file. For the join 
we have something like lhsHasCompatPartition. I'm fine with changing the var 
names in the join of the one here.


Line 760:         // the aggregation in the child fragment.
.. in the child fragment without an extra merge step.


Line 809:       throws ImpalaException, InternalException {
ImpalaException should be sufficient


http://gerrit.cloudera.org:8080/#/c/2414/4/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test:

Line 920: from tpch_parquet.customer inner join [shuffle] tpch_parquet.orders 
on c_custkey = o_custkey
add a having clause and a limit to make sure those work correctly

also add a new test where with such an aggregation inside an inline view, and 
the parent block has a limit + where clause that should be assigned to the 
aggregation

(some of these cases are covered by more complicated test suggested below)


Line 947: select straight_join count(distinct c_custkey)
Add or modify an existing test where there are interesting expressions on the 
affected slots.


Line 979: select straight_join c_custkey, count(distinct c_custkey)
Add a new more complicated test that combined two joins and an aggregation in 
the same fragment, plus some inline view indirection. Something like:

 select id, bigint_col, count(*) from (
  select straight_join t1.id id, t2.bigint_col as bigint_col
  inner join [shuffle] t2
    on t1.id = t2.id and t1.int_col = t2.bigint_col
  inner join t3
    on t2.id = t3.id and t2.bigint_col = t3.id
) v
group by 1, 2
having count(*) > 10
limit 10


-- 
To view, visit http://gerrit.cloudera.org:8080/2414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iffdcfd3629b8a69bd23915e1adba3b8323cbbaef
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2502: don't redundantly repartition grouping aggregations

Reply via email to