Gianmarco De Francisci Morales created PIG-2932: ---------------------------------------------------
Summary: Setting high default_parallel causes IOException in local mode Key: PIG-2932 URL: https://issues.apache.org/jira/browse/PIG-2932 Project: Pig Issue Type: Bug Reporter: Gianmarco De Francisci Morales Priority: Critical This bug has been confirmed only in local mode. When setting a high default_parallel, Pig fails on some operations. The following data and script reproduce the bug. Data: {code} grunt> cat file.txt 11 1 qwer 12 2 qwerty 13 3 ert 13 3 ertyu 14 4 zxcv 16 6 fsdfg 16 6 fdfghj 18 8 fjklopi {code} Script: {code} SET default_parallel 9 a = load 'file.txt' as (id1:int, id2:int, str:chararray); b = group a by (id1,id2); c = foreach b generate flatten(group), a; d = order c by group::id1 ASC, group::id2 ASC; dump d {code} Error: {code} 2012-09-26 15:28:13,230 [Thread-32] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: d[12,4] C: R: 2012-09-26 15:28:13,232 [Thread-32] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0009 java.io.IOException: Illegal partition for Null: false index: 0 (12,2) (1) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) {code} The script succeeds if default_parallel is set to 2. I guess it depends on the fact that the default_parallel is higher than the number of unique keys, probably some quirk with ORDER BY. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira