[ 
https://issues.apache.org/jira/browse/PIG-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518816#comment-14518816
 ] 

liyunzhang_intel commented on PIG-4295:
---------------------------------------

[~mohitsabharwal],[~xuefuz]:
PigContext#packageImportList is  a threadLocal variable which can not be 
deserialized, it may have different values in different thread. But in mr it 
uses following way to initialize correctly in different threads.

in 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase#setup
Line 171: 
PigContext.setPackageImportList((ArrayList<String>)ObjectSerializer.deserialize(job.get("udf.import.list")));
  // deserializes "udf.import.list: from configuration and then 
setPackageImportList

Line 181:    mp = (PhysicalPlan) ObjectSerializer.deserialize(  
                job.get("pig.mapPlan"));   //  it will call POCast#readObject 
later

My question is is there any method like 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase#setup
 in spark, which can be called to get the value "udf.import.list" from 
configuration before  spark deserializes objects.

> Enable unit test "TestPigContext" for spark
> -------------------------------------------
>
>                 Key: PIG-4295
>                 URL: https://issues.apache.org/jira/browse/PIG-4295
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4295.patch, 
> TEST-org.apache.pig.test.TestPigContext.txt
>
>
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to