[jira] [Commented] (PIG-2932) Setting high default_parallel causes IOException in local mode

Cheolsoo Park (JIRA) Wed, 26 Sep 2012 19:35:10 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464380#comment-13464380
 ]


Cheolsoo Park commented on PIG-2932:
------------------------------------

In fact, the example script works fine with MR2 in local mode, so that makes me 
believe that this is an issue with LocalJobRunner of hadoop 0.20. Please see 
MAPREDUCE-1367.

The best fix is probably disabling the tests in local mode. I don't think that 
there is a way to disable tests only when hadoopversion == 20 && execonly == 
local.

Thanks!
                
> Setting high default_parallel causes IOException in local mode
> --------------------------------------------------------------
>
>                 Key: PIG-2932
>                 URL: https://issues.apache.org/jira/browse/PIG-2932
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Gianmarco De Francisci Morales
>            Priority: Critical
>
> This bug has been confirmed only in local mode.
> When setting a high default_parallel, Pig fails on some operations.
> The following data and script reproduce the bug.
> Data:
> {code}
> grunt> cat file.txt                                  
> 11    1       qwer
> 12    2       qwerty
> 13    3       ert
> 13    3       ertyu
> 14    4       zxcv
> 16    6       fsdfg
> 16    6       fdfghj
> 18    8       fjklopi
> {code}
> Script:
> {code}
> SET default_parallel 9
> a = load 'file.txt' as (id1:int, id2:int, str:chararray);
> b = group a by (id1,id2);
> c = foreach b generate flatten(group), a;
> d = order c by group::id1 ASC, group::id2 ASC;
> dump d
> {code}
> Error:
> {code}
> 2012-09-26 15:28:13,230 [Thread-32] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map
>  - Aliases being processed per job phase (AliasName[line,offset]): M: d[12,4] 
> C:  R: 
> 2012-09-26 15:28:13,232 [Thread-32] WARN  
> org.apache.hadoop.mapred.LocalJobRunner - job_local_0009
> java.io.IOException: Illegal partition for Null: false index: 0 (12,2) (1)
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
>       at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
>       at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>       at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> {code}
> The script succeeds if default_parallel is set to 2.
> I guess it depends on the fact that the default_parallel is higher than the 
> number of unique keys, probably some quirk with ORDER BY.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2932) Setting high default_parallel causes IOException in local mode

Reply via email to