----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18210/#review34962 -----------------------------------------------------------
Hyunsik, thank you for waiting. I tested the patch on my local cluster. But validation for different columns doesn't work as expected. For example, following queries finished without the PlanningException. - select count(distinct id), sum(distinct score) from table1 - select id, count(distinct id), sum(distinct name) from table1 group by id For reference, I created a table which written at tajo wiki. Anyway, I found that it has never been called. Please, check this situation. And if that's okay with you, I want to suggest unit test cases for unsupported queries. But if you think that it's waste of resource, may be disregarded. :) - Jung JaeHwa On Feb. 18, 2014, 12:03 p.m., Hyunsik Choi wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/18210/ > ----------------------------------------------------------- > > (Updated Feb. 18, 2014, 12:03 p.m.) > > > Review request for Tajo. > > > Bugs: TAJO-601 > https://issues.apache.org/jira/browse/TAJO-601 > > > Repository: tajo > > > Description > ------- > > Currently, distinct aggregation queries are executed as follows: > * the first stage: it just shuffles tuples by hashing grouping keys. > * the second stage: it sorts them and executes sort aggregation. > > This way executes queries including distinct aggregation functions with only > two stages. But, it leads to large intermediate data during shuffle phase. > > This kind of query can be rewritten as two queries: > > [Original query] > ---------- > SELECT grp1, grp2, count(*) as total, count(distinct grp3) as distinct_col > from rel1 group by grp1, grp2; > ---------- > > [Rewritten query] > ---------- > SELECT grp1, grp2, sum(cnt) as total, count(grp3) as distinct_col from ( > SELECT grp1, grp2, grp3, count(*) as cnt from rel1 group by grp1, grp2, > grp3) tmp1 group by grp1, grp2 > ) table1; > ---------- > > I'm expecting that this rewrite will significantly reduce the intermediate > data volume and query response time in most cases. > > > Diffs > ----- > > tajo-common/src/main/java/org/apache/tajo/util/TUtil.java cc694d4 > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java > da05739 > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumDoubleDistinct.java > PRE-CREATION > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloat.java > 10fd720 > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloatDistinct.java > PRE-CREATION > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumIntDistinct.java > PRE-CREATION > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumLongDistinct.java > PRE-CREATION > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/ExprsVerifier.java > b14c448 > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlanner.java > f7c0bfa > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java > 624518b > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java > 6dac031 > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java > efa1e05 > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java > f390b52 > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/MasterPlan.java > 91f658d > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java > a0c0eeb > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/FilterPushDownRule.java > 399903c > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/PartitionedTableRewriter.java > e5f7fb4 > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/ProjectionPushDownRule.java > 633d0c1 > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMaster.java > ae6d5eb > > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMasterManagerService.java > 3c30e38 > > tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/eval/TestEvalTreeUtil.java > d756242 > > tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestGroupByQuery.java > 1f80bce > > tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestExecutionBlockCursor.java > 053c028 > > tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestGlobalPlanner.java > 2d3124d > > tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct.sql > 6fe604e > > tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct2.sql > 6bf8a8a > > tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation1.sql > PRE-CREATION > > tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation2.sql > PRE-CREATION > > tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation3.sql > PRE-CREATION > > tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation4.sql > PRE-CREATION > > tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation5.sql > PRE-CREATION > > tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregationWithHaving1.sql > PRE-CREATION > > tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct.result > f2ad32a > > tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct2.result > 9164120 > > tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation1.result > PRE-CREATION > > tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation2.result > PRE-CREATION > > tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation3.result > PRE-CREATION > > tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation4.result > PRE-CREATION > > tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation5.result > PRE-CREATION > > tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregationWithHaving1.result > PRE-CREATION > > Diff: https://reviews.apache.org/r/18210/diff/ > > > Testing > ------- > > mvn clean install > > > Thanks, > > Hyunsik Choi > >
