[ 
https://issues.apache.org/jira/browse/PIG-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648815#comment-13648815
 ] 

Daniel Dai commented on PIG-3287:
---------------------------------

Yes, we weight multiquery over combiner, though it is not optimal for some 
cases. Please use "-M" to disable multiquery. Multiquery might be improved to 
allow combiner, this is a new feature and involves non trivial work.
                
> MultiQueryOptimizer can prevent CombinerOptimizer from working
> --------------------------------------------------------------
>
>                 Key: PIG-3287
>                 URL: https://issues.apache.org/jira/browse/PIG-3287
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Christon DeWan
>
> The CombinerOptimizer does not operate on the script below. As a result, all 
> work is done in the reducer(s), killing performance. Removing one STORE or 
> refactoring the query to use a single FOREACH after the group allows the 
> CombinerOptimizer to work.
> {noformat}
> %declare DUMMY `bash -c '(for (( i=0; \$i < 10; i++ )); do echo \$i 5; done) 
> | hadoop fs -put - /tmp/test_data.tsv; true'`
> s = LOAD '/tmp/test_data.tsv' USING PigStorage(' ') AS (n:long, g:long);
> grouped = GROUP s BY g;
> counted = FOREACH grouped GENERATE flatten($0), COUNT_STAR($1);
> STORE counted INTO '/tmp/test_count';
> summed = FOREACH grouped GENERATE flatten($0), SUM($1.n);
> STORE summed INTO '/tmp/test_sum';
> FS -rmr /tmp/test_{data.tsv,count,sum}
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to