[
https://issues.apache.org/jira/browse/PIG-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818623#comment-13818623
]
Cheolsoo Park commented on PIG-3478:
------------------------------------
[~jeremykarn], I ran e2e tests (StreamingPythonUDFs) on an EMR Hadoop 2.2
cluster and saw two issues as follows:
# NPE in StreamingUDF.java
{code}
2013-11-10 22:32:19,694 FATAL [IPC Server handler 11 on 33809]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
attempt_1383086282107_1892_m_000000_3 - exited :
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while
executing [POUserFunc (Name:
POUserFunc(org.apache.pig.impl.builtin.StreamingUDF)[int] - scope-3 Operator
Key: scope-3) children: null at []]: java.lang.NullPointerException
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:338)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:775)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.lang.NullPointerException
at
org.apache.pig.impl.builtin.StreamingUDF.ensureUserFileAvailable(StreamingUDF.java:249)
at
org.apache.pig.impl.builtin.StreamingUDF.constructCommand(StreamingUDF.java:218)
at
org.apache.pig.impl.builtin.StreamingUDF.startUdfController(StreamingUDF.java:163)
at
org.apache.pig.impl.builtin.StreamingUDF.initialize(StreamingUDF.java:156)
at org.apache.pig.impl.builtin.StreamingUDF.exec(StreamingUDF.java:146)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextInteger(POUserFunc.java:379)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:321)
... 13 more
{code}
NPE is thrown from {{udfFileStream.close();}} where udfFileStream is null.
# After fixing #1 by adding a null check, I ran into this error:
{code}
2013-11-10 23:00:51,402 FATAL [IPC Server handler 11 on 40139]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
attempt_1383086282107_1905_m_000000_3 - exited :
org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught error
from UDF: StreamingUDF [Could not create directory:
/home/hadoop/.versions/2.2.0/logs/udfOutput]at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:358)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextInteger(POUserFunc.java:379)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:321)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:775)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.io.IOException: Could not create directory:
/home/hadoop/.versions/2.2.0/logs/udfOutput
at
org.apache.pig.scripting.ScriptingOutputCapturer.getTaskLogDir(ScriptingOutputCapturer.java:104)
at
org.apache.pig.scripting.ScriptingOutputCapturer.getStandardOutputRootWriteLocation(ScriptingOutputCapturer.java:86)
at
org.apache.pig.impl.builtin.StreamingUDF.constructCommand(StreamingUDF.java:187)
at
org.apache.pig.impl.builtin.StreamingUDF.startUdfController(StreamingUDF.java:163)
at org.apache.pig.impl.builtin.StreamingUDF.initialize(StreamingUDF.java:156)at
org.apache.pig.impl.builtin.StreamingUDF.exec(StreamingUDF.java:146)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330)...
15 more
{code}
Can you look into these failures? We should also enable {{StreamingPythonUDFs}}
tests in nightly.conf once they're fixed.
> Make StreamingUDF work for Hadoop 2
> -----------------------------------
>
> Key: PIG-3478
> URL: https://issues.apache.org/jira/browse/PIG-3478
> Project: Pig
> Issue Type: Bug
> Components: impl
> Reporter: Daniel Dai
> Assignee: Jeremy Karn
> Fix For: 0.12.1
>
> Attachments: PIG-3478.patch
>
>
> PIG-2417 introduced Streaming UDF. However, it does not work under Hadoop 2.
> Both unit tests/e2e tests under Haodop 2 fails. We need to fix it.
--
This message was sent by Atlassian JIRA
(v6.1#6144)