[ 
https://issues.apache.org/jira/browse/PIG-4112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092020#comment-14092020
 ] 

Cheolsoo Park commented on PIG-4112:
------------------------------------

The problem is that UnionOptimizer runs before TezDagBuilder, so POPackage is 
not yet replaced with POShuffleTezLoad when the optimizer runs. The problem is 
that the optimizer needs to replace the input keys of successors after removing 
union. But since it look up successors by 
{{PlanHelper.getPhysicalOperators(succ.plan, TezInput.class)}}, it does not 
replace the input keys of POPackages (that will be replaced by 
POShuffleTezLoad). Here is the code-
{code}
            for (TezOperator succ : successors) {
                LinkedList<TezInput> inputs = 
PlanHelper.getPhysicalOperators(succ.plan, TezInput.class);
                for (TezInput input : inputs) {
                    for (String key : input.getTezInputs()) {
                        if (key.equals(unionOpKey)) {
                            input.replaceInput(key,
                                    
newOutputKeys[unionOutputKeys.indexOf(succ.getOperatorKey().toString())]);
                        }
                    }
                }
                tezPlan.disconnect(unionOp, succ);
            }
{code}

> NPE in packager when union + group-by followed by replicated join in Tez
> ------------------------------------------------------------------------
>
>                 Key: PIG-4112
>                 URL: https://issues.apache.org/jira/browse/PIG-4112
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.14.0
>
>
> To reproduce the error, run the following query-
> {code}
> A = load 'foo' as (id:int, fruit);
> B = load 'foo' as (id:int, fruit);
> C = union A, B;
> D = group C by id;
> E = load 'foo' as (id:int, fruit);
> F = join D by group, E by id using 'replicated';
> dump F;
> {code}
> Here is the stack trace-
> {code}
> Error: Failure while running task:java.lang.NullPointerException
> : at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.Packager.getValueTuple(Packager.java:215)
> : at 
> org.apache.pig.backend.hadoop.executionengine.tez.POShuffleTezLoad.getNextTuple(POShuffleTezLoad.java:179)
> : at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:301)
> : at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.getNextTuple(POFRJoin.java:270)
> : at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:301)
> : at 
> org.apache.pig.backend.hadoop.executionengine.tez.POStoreTez.getNextTuple(POStoreTez.java:113)
> : at 
> org.apache.pig.backend.hadoop.executionengine.tez.PigProcessor.runPipeline(PigProcessor.java:317)
> : at 
> org.apache.pig.backend.hadoop.executionengine.tez.PigProcessor.run(PigProcessor.java:196)
> : at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
> : at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
> : at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
> : at java.security.AccessController.doPrivileged(Native Method)
> : at javax.security.auth.Subject.doAs(Subject.java:415)
> : at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> : at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172)
> : at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167)
> : at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> : at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> : at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> : at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to