[ 
https://issues.apache.org/jira/browse/TAJO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896731#comment-13896731
 ] 

Hyunsik Choi commented on TAJO-593:
-----------------------------------

Created a review request against branch master in reviewboard 
https://reviews.apache.org/r/17905/


> outer groupby and groupby in derived table causes only one shuffle output 
> number
> --------------------------------------------------------------------------------
>
>                 Key: TAJO-593
>                 URL: https://issues.apache.org/jira/browse/TAJO-593
>             Project: Tajo
>          Issue Type: Bug
>          Components: distributed query plan
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>             Fix For: 0.8-incubating
>
>         Attachments: TAJO-593.patch
>
>
> See the following query case:
> {code:sql}
> select count(*) from (select l_orderkey, l_partkey, count(*) from lineitem 
> group by l_orderkey, l_partkey) t1;
> {code}
> In this case, SubQuery::calculateShuffleOutputNum() are used two times for 
> choosing the number of shuffle outputs. At that time, 
> SubQuery::calculateShuffleOutputNum() method finds GroupByNode to know the 
> number of grouping keys. Here is one bug. 
> SubQuery::calculateShuffleOutputNum() always the topmost GroupByNode. In most 
> cases, it work well. But, outer groupby and groupby in derived table can 
> cause the problem. In this case, we must use the most bottom groupby node. 
> Actually, it is always the correct way.
> This patch fixes SubQuery::calculateShuffleOutputNum() to use the most bottom 
> groupby node.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to