-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18210/
-----------------------------------------------------------

Review request for Tajo.


Bugs: TAJO-601
    https://issues.apache.org/jira/browse/TAJO-601


Repository: tajo


Description
-------

Currently, distinct aggregation queries are executed as follows:
* the first stage: it just shuffles tuples by hashing grouping keys.
* the second stage: it sorts them and executes sort aggregation.

This way executes queries including distinct aggregation functions with only 
two stages. But, it leads to large intermediate data during shuffle phase.

This kind of query can be rewritten as two queries:

[Original query]
----------
SELECT grp1, grp2, count(*) as total, count(distinct grp3) as distinct_col from 
rel1 group by grp1, grp2;
----------

[Rewritten query]
----------
SELECT grp1, grp2, sum(cnt) as total, count(grp3) as distinct_col from (
  SELECT grp1, grp2, grp3, count(*) as cnt from rel1 group by grp1, grp2, grp3) 
tmp1 group by grp1, grp2
) table1;
----------

I'm expecting that this rewrite will significantly reduce the intermediate data 
volume and query response time in most cases.


Diffs
-----

  tajo-common/src/main/java/org/apache/tajo/util/TUtil.java cc694d4 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java
 da05739 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumDoubleDistinct.java
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloat.java
 10fd720 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloatDistinct.java
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumIntDistinct.java
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumLongDistinct.java
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/ExprsVerifier.java
 b14c448 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlanner.java
 f7c0bfa 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java
 624518b 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java
 6dac031 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java
 efa1e05 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java
 f390b52 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/MasterPlan.java
 91f658d 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java
 a0c0eeb 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/FilterPushDownRule.java
 399903c 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/PartitionedTableRewriter.java
 e5f7fb4 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/ProjectionPushDownRule.java
 633d0c1 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMaster.java
 ae6d5eb 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMasterManagerService.java
 3c30e38 
  
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/eval/TestEvalTreeUtil.java
 d756242 
  
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestGroupByQuery.java
 1f80bce 
  
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestExecutionBlockCursor.java
 053c028 
  
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestGlobalPlanner.java
 2d3124d 
  
tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct.sql
 6fe604e 
  
tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct2.sql
 6bf8a8a 
  
tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation1.sql
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation2.sql
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation3.sql
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation4.sql
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation5.sql
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregationWithHaving1.sql
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct.result
 f2ad32a 
  
tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct2.result
 9164120 
  
tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation1.result
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation2.result
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation3.result
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation4.result
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation5.result
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregationWithHaving1.result
 PRE-CREATION 

Diff: https://reviews.apache.org/r/18210/diff/


Testing
-------

mvn clean install


Thanks,

Hyunsik Choi

Reply via email to