[ https://issues.apache.org/jira/browse/PIG-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheolsoo Park updated PIG-2932: ------------------------------- Assignee: Cheolsoo Park Status: Patch Available (was: Open) > Setting high default_parallel causes IOException in local mode > -------------------------------------------------------------- > > Key: PIG-2932 > URL: https://issues.apache.org/jira/browse/PIG-2932 > Project: Pig > Issue Type: Bug > Reporter: Gianmarco De Francisci Morales > Assignee: Cheolsoo Park > Priority: Critical > Attachments: PIG-2932.patch > > > This bug has been confirmed only in local mode. > When setting a high default_parallel, Pig fails on some operations. > The following data and script reproduce the bug. > Data: > {code} > grunt> cat file.txt > 11 1 qwer > 12 2 qwerty > 13 3 ert > 13 3 ertyu > 14 4 zxcv > 16 6 fsdfg > 16 6 fdfghj > 18 8 fjklopi > {code} > Script: > {code} > SET default_parallel 9 > a = load 'file.txt' as (id1:int, id2:int, str:chararray); > b = group a by (id1,id2); > c = foreach b generate flatten(group), a; > d = order c by group::id1 ASC, group::id2 ASC; > dump d > {code} > Error: > {code} > 2012-09-26 15:28:13,230 [Thread-32] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map > - Aliases being processed per job phase (AliasName[line,offset]): M: d[12,4] > C: R: > 2012-09-26 15:28:13,232 [Thread-32] WARN > org.apache.hadoop.mapred.LocalJobRunner - job_local_0009 > java.io.IOException: Illegal partition for Null: false index: 0 (12,2) (1) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) > at > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > {code} > The script succeeds if default_parallel is set to 2. > I guess it depends on the fact that the default_parallel is higher than the > number of unique keys, probably some quirk with ORDER BY. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira