Hyunsik Choi created TAJO-593:
---------------------------------

             Summary: outer groupby and groupby in derived table causes only 
one shuffle output number
                 Key: TAJO-593
                 URL: https://issues.apache.org/jira/browse/TAJO-593
             Project: Tajo
          Issue Type: Bug
          Components: distributed query plan
            Reporter: Hyunsik Choi
            Assignee: Hyunsik Choi
             Fix For: 0.8-incubating


See the following query case:

{code:sql}
select count(*) from (select l_orderkey, l_partkey, count(*) from lineitem 
group by l_orderkey, l_partkey) t1;
{code}

In this case, SubQuery::calculateShuffleOutputNum() are used two times for 
choosing the number of shuffle outputs. At that time, 
SubQuery::calculateShuffleOutputNum() method finds GroupByNode to know the 
number of grouping keys. Here is one bug. SubQuery::calculateShuffleOutputNum() 
always the topmost GroupByNode. In most cases, it work well. But, outer groupby 
and groupby in derived table can cause the problem. In this case, we must use 
the most bottom groupby node. Actually, it is always the correct way.

This patch fixes SubQuery::calculateShuffleOutputNum() to use the most bottom 
groupby node.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to