[
https://issues.apache.org/jira/browse/PIG-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liyunzhang_intel updated PIG-4944:
----------------------------------
Attachment: PIG-4944.patch
Let's explain why we need reset (UDFContext.getUDFContext().addJobConf(null))
in SparkLauncher#resetUDFContext:
If we don't reset the value, following error will be thrown:
{code:title=org.apache.pig.test.TestEvalPipelineLocal.SetLocationTestLoadFunc}
public static class SetLocationTestLoadFunc extends PigStorage {
String suffix = "test";
public SetLocationTestLoadFunc() {
}
@Override
public void setLocation(String location, Job job) throws IOException {
super.setLocation(location, job);
Properties p =
UDFContext.getUDFContext().getUDFProperties(this.getClass());
if (UDFContext.getUDFContext().isFrontend()) {
p.setProperty("t_"+signature, "test");
} else {
if (p.getProperty("t_"+signature)==null)
throw new IOException("property expected"); //Throw this
exception
}
}
}
{code}
It is interesting that we need not
reset(UDFContext.getUDFContext().addJobConf(null)) in mr but we need do this
in spark or tez
mode([org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder#addCombiner|https://github.com/apache/pig/blob/7cf1a945772f49ff620d7eab75bf2c7e635ab2ae/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java#L1008]).
The reason is because UDFContext.getUDFContext is a threadlocal variable.
These variables differ in different thread. In mr mode,
UDFContext.getUDFContext().setJobConf(null) in main thread and
UDFContext.getUDFContext().getJobConf() is null when
TestEvalPipelineLocal.SetLocationTestLoadFunc is called in main thread. The
behavior differs in spark mode:
UDFContext.getUDFContext().setJobConf(null) and
UDFContext.getUDFContext().setJobConf(not null) in main thread, so when
org.apache.pig.test.TestEvalPipelineLocal.SetLocationTestLoadFunc, exception is
thrown out.
I add some log info in the code to verify above conclusion:
in attached TestEvalPipelineLocal.spark:
It shows that in spark mode
in Line 24548 UDFContext#addJobConf(null) is called, then in Line 24621
UDFContext#addJobConf(not null) is called in main thread and in Line 26138
SetLocationTestLoadFunc#setLocation is called in main thread.
in attached TestEvalPipelineLocal.mr,
It shows that in mr mode in Line 5734
UDFContext.getUDFContext().addJobConf(null) in main thread and in Line 6097
SetLocationTestLoadFunc#setLocation is called in main thread. Between Line 5734
and Line 6097, UDFContext.getUDFContext().addJobConf(not null) is not called in
main thread.
> Reset UDFContext#jobConf in spark mode
> --------------------------------------
>
> Key: PIG-4944
> URL: https://issues.apache.org/jira/browse/PIG-4944
> Project: Pig
> Issue Type: Bug
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Attachments: PIG-4944.patch
>
>
> Community gave some comments about TestEvalPipelineLocal unit test:
> https://reviews.apache.org/r/45667/#comment199056
> We can reset "UDFContext.getUDFContext().addJobConf(null)" in other place not
> in TestEvalPipelineLocal#testSetLocationCalledInFE
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)