[
https://issues.apache.org/jira/browse/PIG-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xuefu Zhang updated PIG-3469:
-----------------------------
Resolution: Fixed
Fix Version/s: 0.13.0
Status: Resolved (was: Patch Available)
Patch committed to trunk. Thanks, Jarces!
> Skewed join can cause unrecoverable NullPointerException when one of its
> inputs is missing.
> -------------------------------------------------------------------------------------------
>
> Key: PIG-3469
> URL: https://issues.apache.org/jira/browse/PIG-3469
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.11
> Environment: Apache Pig version 0.11.0-cdh4.4.0
> Happens in both local execution environment (os x) and cluster environment
> (linux)
> Reporter: Christon DeWan
> Assignee: Jarek Jarcec Cecho
> Fix For: 0.13.0
>
> Attachments: PIG-3469.patch, PIG-3469.patch, PIG-3469.patch
>
>
> Run this script in the local execution environment (affects cluster mode too):
> {noformat}
> %declare DATA_EXISTS /tmp/test_data_exists.tsv
> %declare DATA_MISSING /tmp/test_data_missing.tsv
> %declare DUMMY `bash -c '(for (( i=0; \$i < 10; i++ )); do echo \$i; done) >
> /tmp/test_data_exists.tsv; true'`
> exists = LOAD '$DATA_EXISTS' AS (a:long);
> missing = LOAD '$DATA_MISSING' AS (a:long);
> missing = FOREACH ( GROUP missing BY a ) GENERATE $0 AS a, COUNT_STAR($1);
> joined = JOIN exists BY a, missing BY a USING 'skewed';
> STORE joined INTO '/tmp/test_out.tsv';
> {noformat}
> Results in NullPointerException which halts entire pig execution, including
> unrelated jobs. Expected: only dependencies of the error'd LOAD statement
> should fail.
> Error:
> {noformat}
> 2013-09-18 11:42:31,518 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 2017: Internal error creating job configuration.
> 2013-09-18 11:42:31,518 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
> ERROR 2017: Internal error creating job configuration.
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:848)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:294)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1266)
> at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251)
> at org.apache.pig.PigServer.execute(PigServer.java:1241)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:335)
> at
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:137)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
> at org.apache.pig.Main.run(Main.java:604)
> at org.apache.pig.Main.main(Main.java:157)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Caused by: java.lang.NullPointerException
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.adjustNumReducers(JobControlCompiler.java:868)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:480)
> ... 17 more
> {noformat}
> Script above is as small as I can make it while still reproducing the issue.
> Removing the group-foreach causes the join to fail harmlessly (not stopping
> pig execution), as does using the default join. Did not occur on 0.10.1.
--
This message was sent by Atlassian JIRA
(v6.1#6144)