[
https://issues.apache.org/jira/browse/PIG-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518816#comment-14518816
]
liyunzhang_intel commented on PIG-4295:
---------------------------------------
[~mohitsabharwal],[~xuefuz]:
PigContext#packageImportList is a threadLocal variable which can not be
deserialized, it may have different values in different thread. But in mr it
uses following way to initialize correctly in different threads.
in
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase#setup
Line 171:
PigContext.setPackageImportList((ArrayList<String>)ObjectSerializer.deserialize(job.get("udf.import.list")));
// deserializes "udf.import.list: from configuration and then
setPackageImportList
Line 181: mp = (PhysicalPlan) ObjectSerializer.deserialize(
job.get("pig.mapPlan")); // it will call POCast#readObject
later
My question is is there any method like
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase#setup
in spark, which can be called to get the value "udf.import.list" from
configuration before spark deserializes objects.
> Enable unit test "TestPigContext" for spark
> -------------------------------------------
>
> Key: PIG-4295
> URL: https://issues.apache.org/jira/browse/PIG-4295
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4295.patch,
> TEST-org.apache.pig.test.TestPigContext.txt
>
>
> error log is attached
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)