[ 
https://issues.apache.org/jira/browse/PIG-5372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5372:
------------------------------
    Attachment: pig-5372-v2.patch

Thanks [~daijy]! 

{quote}
We can change MapRedUtil.loadPartitionFileFromLocalCache to retrieve 
fs.file.impl/fs.hdfs.impl from mapConf. Then we don't need overwrite 
PigMapReduce.sJobConf in SkewedPartitioner.setConf.
{quote}
Attaching {{pig-5372-v2.patch}} which does this.
Ideally we should have a test case for "fs.file.impl/fs.hdfs.impl " but I fail 
to see when we ever want to disable cache for fs.file&fs.hdfs in the first 
place.

> SAMPLE/RANDOM(udf) before skewed join failing with NPE
> ------------------------------------------------------
>
>                 Key: PIG-5372
>                 URL: https://issues.apache.org/jira/browse/PIG-5372
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.16.0
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Major
>         Attachments: pig-5372-v1.patch, pig-5372-v2.patch
>
>
> Sample short code like below
> {code}
> A = LOAD 'input.txt' AS (a1:int, a2:chararray, a3:int);
> B = LOAD 'input.txt' AS (b1:int, b2:chararray, b3:int);
> A2 = FOREACH A generate *, RANDOM() as randnum;
> D = join A2 by a1, B by b1 using 'skewed' parallel 2;
> store D into '$output';
> {code}
> Fails with NPE. 
> {noformat}
> 2018-12-12 16:06:04,860 [Dispatcher thread: Central] INFO  
> org.apache.tez.dag.history.HistoryEventHandler - 
> [HISTORY][DAG:dag_1544648742542_0001_1][Event:TASK_FINISHED]: 
> vertexName=scope-55, taskId=task_1544648742542_0001_1_02_000000, 
> startTime=1544648745036, finishTime=1544648764857, timeTaken=19821, 
> status=KILLED, successfulAttemptID=null, diagnostics=TaskAttempt 0 failed, 
> info=[Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: Local Rearrange[tuple]{int}(false) - scope-29 ->       
> scope-58 Operator Key: scope-29): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing [POUserFunc (Name: 
> POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-40 Operator Key: 
> scope-40) children: null at []]: java.lang.NullPointerException
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:315)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
>         at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:131)
>         at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:420)
>         at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:282)
>         at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
>         at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>         at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>         at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>         at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>         at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing [POUserFunc (Name: 
> POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-40 Operator Key: 
> scope-40) children: null at []]: java.lang.NullPointerException
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:367)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:408)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:325)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
>         ... 17 more
> Caused by: java.lang.NullPointerException
>         at org.apache.pig.builtin.RANDOM.exec(RANDOM.java:51)
>         at org.apache.pig.builtin.RANDOM.exec(RANDOM.java:37)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:332)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDouble(POUserFunc.java:396)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:343)
>         ... 20 more
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to