Christon DeWan created PIG-3287:
-----------------------------------
Summary: MultiQueryOptimizer can prevent CombinerOptimizer from
working
Key: PIG-3287
URL: https://issues.apache.org/jira/browse/PIG-3287
Project: Pig
Issue Type: Bug
Affects Versions: 0.10.1
Reporter: Christon DeWan
The CombinerOptimizer does not operate on the script below. As a result, all
work is done in the reducer(s), killing performance. Removing one STORE or
refactoring the query to use a single FOREACH after the group allows the
CombinerOptimizer to work.
{noformat}
%declare DUMMY `bash -c '(for (( i=0; \$i < 10; i++ )); do echo \$i 5; done) |
hadoop fs -put - /tmp/test_data.tsv; true'`
s = LOAD '/tmp/test_data.tsv' USING PigStorage(' ') AS (n:long, g:long);
grouped = GROUP s BY g;
counted = FOREACH grouped GENERATE flatten($0), COUNT_STAR($1);
STORE counted INTO '/tmp/test_count';
summed = FOREACH grouped GENERATE flatten($0), SUM($1.n);
STORE summed INTO '/tmp/test_sum';
FS -rmr /tmp/test_{data.tsv,count,sum}
{noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira