[jira] [Commented] (PIG-4295) Enable unit test "TestPigContext" for spark

Mohit Sabharwal (JIRA) Mon, 18 May 2015 12:14:43 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548576#comment-14548576
 ]


Mohit Sabharwal commented on PIG-4295:
--------------------------------------

Thanks, [~kellyzly], LGTM. Couple of comments:

(1) In SparkLauncher, when saving the udf import list, I think it's safer to 
include the {{udf.import.list}} property, if it's already populated, i.e. 
something like

{code}
+    private void saveUdfImportList(PigContext pigContext) {
+        String propertyVal = pigContext.getProperties().get("udf.import.list");
+        String udfImportList = 
Joiner.on(",").join(PigContext.getPackageImportList());
+        if (propertyVal != null) {
+            udfImportList = udfImportList + "," + propertyVal;
+        }
+        pigContext.getProperties().setProperty("spark.udf.import.list", 
udfImportList);
+    }
{code}

This is because PigContext#init() obviously assumes that there is a possibility 
that this property is already populated. So, we should not loose that (initial) 
value for Spark engine.

(2) It's also safer I think, in both writeObject and readObject to include the 
Spark specific property only conditioned on Spark engine.

{code}
if (Util.isSparkExecType(getExecutionEngine()) {
+        if (packageImportList.get() == null) {
+            String udfImportListStr = 
properties.getProperty("spark.udf.import.list");
+            if (udfImportListStr != null) {
+                udfImportList = 
Lists.newArrayList(Splitter.on(",").split(udfImportListStr));
+            }
+        } else {
+            udfImportList = packageImportList.get();
+        }
+        out.writeObject(udfImportList);
}

if (Util.isSparkExecType(getExecutionEngine()) {
+        ArrayList<String> udfImportList = (ArrayList<String>) in.readObject();
+        packageImportList.set(udfImportList);
}
{code}

This way there is no accidental risk of overwriting {{packageImportList}} for 
MR/Tez. There is also no reason to serialize the udf import list explicitly for 
MR and Tez engines.

(3) Could you also please add few comments on SparkLauncher#saveUdfImportList 
for the reason behind the new property {{spark.udf.import.list}}. Thanks!


> Enable unit test "TestPigContext" for spark
> -------------------------------------------
>
>                 Key: PIG-4295
>                 URL: https://issues.apache.org/jira/browse/PIG-4295
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>    Affects Versions: spark-branch
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4295.patch, PIG-4295_1.patch, PIG-4295_2.patch, 
> TEST-org.apache.pig.test.TestPigContext.txt
>
>
> error log is attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4295) Enable unit test "TestPigContext" for spark

Reply via email to