[
https://issues.apache.org/jira/browse/PIG-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548576#comment-14548576
]
Mohit Sabharwal commented on PIG-4295:
--------------------------------------
Thanks, [~kellyzly], LGTM. Couple of comments:
(1) In SparkLauncher, when saving the udf import list, I think it's safer to
include the {{udf.import.list}} property, if it's already populated, i.e.
something like
{code}
+ private void saveUdfImportList(PigContext pigContext) {
+ String propertyVal = pigContext.getProperties().get("udf.import.list");
+ String udfImportList =
Joiner.on(",").join(PigContext.getPackageImportList());
+ if (propertyVal != null) {
+ udfImportList = udfImportList + "," + propertyVal;
+ }
+ pigContext.getProperties().setProperty("spark.udf.import.list",
udfImportList);
+ }
{code}
This is because PigContext#init() obviously assumes that there is a possibility
that this property is already populated. So, we should not loose that (initial)
value for Spark engine.
(2) It's also safer I think, in both writeObject and readObject to include the
Spark specific property only conditioned on Spark engine.
{code}
if (Util.isSparkExecType(getExecutionEngine()) {
+ if (packageImportList.get() == null) {
+ String udfImportListStr =
properties.getProperty("spark.udf.import.list");
+ if (udfImportListStr != null) {
+ udfImportList =
Lists.newArrayList(Splitter.on(",").split(udfImportListStr));
+ }
+ } else {
+ udfImportList = packageImportList.get();
+ }
+ out.writeObject(udfImportList);
}
if (Util.isSparkExecType(getExecutionEngine()) {
+ ArrayList<String> udfImportList = (ArrayList<String>) in.readObject();
+ packageImportList.set(udfImportList);
}
{code}
This way there is no accidental risk of overwriting {{packageImportList}} for
MR/Tez. There is also no reason to serialize the udf import list explicitly for
MR and Tez engines.
(3) Could you also please add few comments on SparkLauncher#saveUdfImportList
for the reason behind the new property {{spark.udf.import.list}}. Thanks!
> Enable unit test "TestPigContext" for spark
> -------------------------------------------
>
> Key: PIG-4295
> URL: https://issues.apache.org/jira/browse/PIG-4295
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Affects Versions: spark-branch
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4295.patch, PIG-4295_1.patch, PIG-4295_2.patch,
> TEST-org.apache.pig.test.TestPigContext.txt
>
>
> error log is attached
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)