[ 
https://issues.apache.org/jira/browse/PIG-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4944:
----------------------------------
    Attachment: PIG-4944.patch

Let's explain why we need reset (UDFContext.getUDFContext().addJobConf(null)) 
in SparkLauncher#resetUDFContext:

If we don't reset the value, following error will be thrown:
{code:title=org.apache.pig.test.TestEvalPipelineLocal.SetLocationTestLoadFunc}
 public static class SetLocationTestLoadFunc extends PigStorage {
        String suffix = "test";
        public SetLocationTestLoadFunc() {
        }
        @Override
        public void setLocation(String location, Job job) throws IOException {
            super.setLocation(location, job);
            Properties p = 
UDFContext.getUDFContext().getUDFProperties(this.getClass());
            if (UDFContext.getUDFContext().isFrontend()) {
                p.setProperty("t_"+signature, "test");
            } else {
                if (p.getProperty("t_"+signature)==null)
                    throw new IOException("property expected"); //Throw this 
exception
            }
        }
    }
{code}


It is interesting that we need not 
reset(UDFContext.getUDFContext().addJobConf(null)) in mr  but we need do this 
in spark or tez 
mode([org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder#addCombiner|https://github.com/apache/pig/blob/7cf1a945772f49ff620d7eab75bf2c7e635ab2ae/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java#L1008]).
 The reason is because UDFContext.getUDFContext is a threadlocal variable.  
These variables differ in different thread.  In mr mode, 
UDFContext.getUDFContext().setJobConf(null) in main thread and 
UDFContext.getUDFContext().getJobConf() is null when 
TestEvalPipelineLocal.SetLocationTestLoadFunc is called in main thread.  The 
behavior differs in spark mode:
UDFContext.getUDFContext().setJobConf(null) and 
UDFContext.getUDFContext().setJobConf(not null) in main thread, so when 
org.apache.pig.test.TestEvalPipelineLocal.SetLocationTestLoadFunc, exception is 
thrown out.  

I add some log info in the code to verify above conclusion:
in attached TestEvalPipelineLocal.spark:
It shows that in spark mode
in Line 24548  UDFContext#addJobConf(null) is called, then in Line 24621  
UDFContext#addJobConf(not null) is called in main thread and in Line 26138 
SetLocationTestLoadFunc#setLocation is called in main thread.

in attached TestEvalPipelineLocal.mr,
It shows that in mr mode in Line 5734 
UDFContext.getUDFContext().addJobConf(null) in main thread and in Line 6097 
SetLocationTestLoadFunc#setLocation is called in main thread. Between Line 5734 
and Line 6097, UDFContext.getUDFContext().addJobConf(not null) is not called in 
main thread.



> Reset UDFContext#jobConf in spark mode
> --------------------------------------
>
>                 Key: PIG-4944
>                 URL: https://issues.apache.org/jira/browse/PIG-4944
>             Project: Pig
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>         Attachments: PIG-4944.patch
>
>
> Community gave some comments about TestEvalPipelineLocal unit test:
> https://reviews.apache.org/r/45667/#comment199056
> We can reset "UDFContext.getUDFContext().addJobConf(null)" in other place not 
>  in TestEvalPipelineLocal#testSetLocationCalledInFE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to