Christon DeWan created PIG-3287:
-----------------------------------

             Summary: MultiQueryOptimizer can prevent CombinerOptimizer from 
working
                 Key: PIG-3287
                 URL: https://issues.apache.org/jira/browse/PIG-3287
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.10.1
            Reporter: Christon DeWan


The CombinerOptimizer does not operate on the script below. As a result, all 
work is done in the reducer(s), killing performance. Removing one STORE or 
refactoring the query to use a single FOREACH after the group allows the 
CombinerOptimizer to work.

{noformat}
%declare DUMMY `bash -c '(for (( i=0; \$i < 10; i++ )); do echo \$i 5; done) | 
hadoop fs -put - /tmp/test_data.tsv; true'`

s = LOAD '/tmp/test_data.tsv' USING PigStorage(' ') AS (n:long, g:long);

grouped = GROUP s BY g;

counted = FOREACH grouped GENERATE flatten($0), COUNT_STAR($1);
STORE counted INTO '/tmp/test_count';
summed = FOREACH grouped GENERATE flatten($0), SUM($1.n);
STORE summed INTO '/tmp/test_sum';

FS -rmr /tmp/test_{data.tsv,count,sum}
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to