Christon DeWan created PIG-3469:
-----------------------------------

             Summary: Skewed join can cause unrecoverable NullPointerException 
when one of its inputs is missing.
                 Key: PIG-3469
                 URL: https://issues.apache.org/jira/browse/PIG-3469
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.11
         Environment: Apache Pig version 0.11.0-cdh4.4.0
Happens in both local execution environment (os x) and cluster environment 
(linux)
            Reporter: Christon DeWan


Run this script in the local execution environment (affects cluster mode too):
{{
%declare DATA_EXISTS /tmp/test_data_exists.tsv
%declare DATA_MISSING /tmp/test_data_missing.tsv
%declare DUMMY `bash -c '(for (( i=0; \$i < 10; i++ )); do echo \$i; done) > 
/tmp/test_data_exists.tsv; true'`

exists = LOAD '$DATA_EXISTS' AS (a:long);
missing = LOAD '$DATA_MISSING' AS (a:long);

missing = FOREACH ( GROUP missing BY a ) GENERATE $0 AS a, COUNT_STAR($1);

joined = JOIN exists BY a, missing BY a USING 'skewed';

STORE joined INTO '/tmp/test_out.tsv';
}}

Results in NullPointerException which halts entire pig execution, including 
unrelated jobs. Expected: only dependencies of the error'd LOAD statement 
should fail. 

Error:
{{
2013-09-18 11:42:31,518 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
2017: Internal error creating job configuration.
2013-09-18 11:42:31,518 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
 ERROR 2017: Internal error creating job configuration.
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:848)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:294)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177)
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1266)
        at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251)
        at org.apache.pig.PigServer.execute(PigServer.java:1241)
        at org.apache.pig.PigServer.executeBatch(PigServer.java:335)
        at 
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:137)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
        at org.apache.pig.Main.run(Main.java:604)
        at org.apache.pig.Main.main(Main.java:157)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.NullPointerException
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.adjustNumReducers(JobControlCompiler.java:868)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:480)
        ... 17 more

}}

Script above is as small as I can make it while still reproducing the issue. 
Removing the group-foreach causes the join to fail harmlessly (not stopping pig 
execution), as does using the default join. Did not occur on 0.10.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to