[
https://issues.apache.org/jira/browse/TAJO-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896731#comment-13896731
]
Hyunsik Choi commented on TAJO-593:
-----------------------------------
Created a review request against branch master in reviewboard
https://reviews.apache.org/r/17905/
> outer groupby and groupby in derived table causes only one shuffle output
> number
> --------------------------------------------------------------------------------
>
> Key: TAJO-593
> URL: https://issues.apache.org/jira/browse/TAJO-593
> Project: Tajo
> Issue Type: Bug
> Components: distributed query plan
> Reporter: Hyunsik Choi
> Assignee: Hyunsik Choi
> Fix For: 0.8-incubating
>
> Attachments: TAJO-593.patch
>
>
> See the following query case:
> {code:sql}
> select count(*) from (select l_orderkey, l_partkey, count(*) from lineitem
> group by l_orderkey, l_partkey) t1;
> {code}
> In this case, SubQuery::calculateShuffleOutputNum() are used two times for
> choosing the number of shuffle outputs. At that time,
> SubQuery::calculateShuffleOutputNum() method finds GroupByNode to know the
> number of grouping keys. Here is one bug.
> SubQuery::calculateShuffleOutputNum() always the topmost GroupByNode. In most
> cases, it work well. But, outer groupby and groupby in derived table can
> cause the problem. In this case, we must use the most bottom groupby node.
> Actually, it is always the correct way.
> This patch fixes SubQuery::calculateShuffleOutputNum() to use the most bottom
> groupby node.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)