[ https://issues.apache.org/jira/browse/PIG-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liyunzhang_intel updated PIG-4295: ---------------------------------- Attachment: PIG-4295_3.patch [~mohitsabharwal]: Thanks for your review. For your suggestions, following are my comment: {quote} (1) In SparkLauncher, when saving the udf import list, I think it's safer to include the {{udf.import.list}} property, if it's already populated, i.e. something like {code} + private void saveUdfImportList(PigContext pigContext) { + String propertyVal = pigContext.getProperties().get("udf.import.list"); + String udfImportList = Joiner.on(",").join(PigContext.getPackageImportList()); + if (propertyVal != null) { + udfImportList = udfImportList + "," + propertyVal; + } + pigContext.getProperties().setProperty("spark.udf.import.list", udfImportList); + } {code} {quote} we need not include the {{udf.import.list}} property in SparkLauncher#saveUdfImportList because before SparkLauncher#saveUdfImportList, {{udf.import.list}} will be stored in PigContext#packageImportList in PigContext#init {code} private void init() { if (properties.get("udf.import.list")!=null) PigContext.initializeImportList((String)properties.get("udf.import.list")); } {code} {quote} (2) It's also safer I think, in both writeObject and readObject to include the Spark specific property only conditioned on Spark engine. {code} if (Util.isSparkExecType(getExecutionEngine()) { + if (packageImportList.get() == null) { + String udfImportListStr = properties.getProperty("spark.udf.import.list"); + if (udfImportListStr != null) { + udfImportList = Lists.newArrayList(Splitter.on(",").split(udfImportListStr)); + } + } else { + udfImportList = packageImportList.get(); + } + out.writeObject(udfImportList); } if (Util.isSparkExecType(getExecutionEngine()) { + ArrayList<String> udfImportList = (ArrayList<String>) in.readObject(); + packageImportList.set(udfImportList); } {code} This way there is no accidental risk of overwriting {{packageImportList}} for MR/Tez. There is also no reason to serialize the udf import list explicitly for MR and Tez engines. {quote} I have changed the code according to your comment. {quote} (3) Could you also please add few comments on SparkLauncher#saveUdfImportList for the reason behind the new property {{spark.udf.import.list}}. Thanks! {quote} I have changed the code according to your comment. > Enable unit test "TestPigContext" for spark > ------------------------------------------- > > Key: PIG-4295 > URL: https://issues.apache.org/jira/browse/PIG-4295 > Project: Pig > Issue Type: Sub-task > Components: spark > Affects Versions: spark-branch > Reporter: liyunzhang_intel > Assignee: liyunzhang_intel > Fix For: spark-branch > > Attachments: PIG-4295.patch, PIG-4295_1.patch, PIG-4295_2.patch, > PIG-4295_3.patch, TEST-org.apache.pig.test.TestPigContext.txt > > > error log is attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)