[ 
https://issues.apache.org/jira/browse/TAJO-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154391#comment-14154391
 ] 

ASF GitHub Bot commented on TAJO-1010:
--------------------------------------

Github user blrunner commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/136#discussion_r18262290
  
    --- Diff: 
tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java ---
    @@ -816,9 +820,30 @@ public static int calculateShuffleOutputNum(SubQuery 
subQuery, DataChannel chann
             if (grpNode.getType() == NodeType.GROUP_BY) {
               hasGroupColumns = 
((GroupbyNode)grpNode).getGroupingColumns().length > 0;
             } else if (grpNode.getType() == NodeType.DISTINCT_GROUP_BY) {
    -          hasGroupColumns = 
((DistinctGroupbyNode)grpNode).getGroupingColumns().length > 0;
    +          // Find current distinct stage node.
    +          DistinctGroupbyNode distinctNode = 
PlannerUtil.findMostBottomNode(subQuery.getBlock().getPlan(), 
NodeType.DISTINCT_GROUP_BY);
    +          if (distinctNode == null) {
    +            LOG.warn(subQuery.getId() + ", Can't find current 
DistinctGroupbyNode");
    +            distinctNode = (DistinctGroupbyNode)grpNode;
    --- End diff --
    
    I found a bug on production cluster. So, I had to add above codes.


> Improve multiple DISTINCT aggregation.
> --------------------------------------
>
>                 Key: TAJO-1010
>                 URL: https://issues.apache.org/jira/browse/TAJO-1010
>             Project: Tajo
>          Issue Type: Improvement
>          Components: planner/optimizer
>            Reporter: Jaehwa Jung
>            Assignee: Jaehwa Jung
>             Fix For: 0.9.0
>
>
> Currently, tajo provides three stage for optimizing distinct query 
> aggregation. But it just supports one column for distinct aggregation as 
> follows:
> {code:title=Query1|borderStyle=solid}
> select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
> from table1
> group by a.flag
> {code}
> If you write two more columns for distinct aggregation, you can't apply 
> optimized distinct aggregation as follows:
> {code:title=Query2|borderStyle=solid}
> select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
> , count(distinct a.name) as cnt2, count(distinct a.code) as cnt3
> from table1
> group by a.flag
> {code}
> In this case, you may see low performance for your query. Thus, we need to 
> improve multiple DISTINCT aggregation. Correctly, we should support three 
> stage for multiple DISTINCT aggregation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to