[ https://issues.apache.org/jira/browse/PIG-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436324#comment-15436324 ]
liyunzhang_intel commented on PIG-4920: --------------------------------------- [~rohini]: As you are reviewing Pig on Spark, here say more about PIG-4611, PIG-4265(Both are caused by the same problem), now when I fix PIG-4920, I found same problem. So here explain more to help review: Because in spark, the initialization of UDFContext#udfConfs is later than deserialization of all objects while it is different in mr /tez mode. Let ‘s use TestHBaseStorage#testLoadWithProjection_1 to explain more: HBaseStorage#defaultCaster is set in the constructor of HBaseStorage, its value from UDFContext.getUDFContext.getClientSystemProps(). {code} public HBaseStorage(String columnList, String optString) throws ParseException, IOException { …. String defaultCaster = UDFContext.getUDFContext().getClientSystemProps().getProperty(CASTER_PROPERTY, STRING_CASTER); } {code} In mr mode, we initialize UDFContext(PigGenericMapBase#setup -> MapRedUtil.setupUDFContext -> UDFContext#deserialize) first then call HBaseStorage#constructor. In spark mode, Spark will deserialize all object first then starting executor to execute the program. HBaseStorage#constructor first then initialize UDFContext(PigInputFormatSpark#createRecordReader->MapRedUtil#setupUDFContext-> UDFContext#deserialize), so NPE is thrown out in this situation(so in PIG-4611, I verify whether UDFContext.getUDFContext().getClientSystemProps().getProperty(CASTER_PROPERTY, STRING_CASTER) is null or not). The solution is to serialize and deserialize UDFContext#udfConfs and UDFContext#clientSystemProperties when program serialize PigContext in PigContext#writeObject and deserialize PigContext in PigContext#readObject(see PIG-4920_3.patch). After *PIG-4866*, we don’t serialize PigContext in configuration to be backend. But PIG-4866 and this solution is *not conflict*, only PigContext#exec_type, PigContext#packageImportList, UDFContext#udfConfs, UDFContext#clientSystemProps these 4 variables are serialized and deserialized in spark mode. And I think PigContext#readObject and PigContext#writeObject will be only used in spark mode as we did not serialize and deserialize PigContext anymore in mr/tez mode. > Fail to use Javascript UDF in spark yarn client mode > ---------------------------------------------------- > > Key: PIG-4920 > URL: https://issues.apache.org/jira/browse/PIG-4920 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: liyunzhang_intel > Fix For: spark-branch > > Attachments: PIG-4920.patch, PIG-4920_2.patch, PIG-4920_3.patch > > > udf.pig > {code} > register '/home/zly/prj/oss/merge.pig/pig/bin/udf.js' using javascript as > myfuncs; > A = load './passwd' as (a0:chararray, a1:chararray); > B = foreach A generate myfuncs.helloworld(); > store B into './udf.out'; > {code} > udf.js > {code} > helloworld.outputSchema = "word:chararray"; > function helloworld() { > return 'Hello, World'; > } > > complex.outputSchema = "word:chararray"; > function complex(word){ > return {word:word}; > } > {code} > run udf.pig in spark local mode(export SPARK_MASTER="local"), it successfully. > run udf.pig in spark yarn client mode(export SPARK_MASTER="yarn-client"), it > fails and error message like following: > {noformat} > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:408) > at > org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:744) > ... 84 more > Caused by: java.lang.ExceptionInInitializerError > at > org.apache.pig.scripting.js.JsScriptEngine.getInstance(JsScriptEngine.java:87) > at org.apache.pig.scripting.js.JsFunction.<init>(JsFunction.java:173) > ... 89 more > Caused by: java.lang.IllegalStateException: could not get script path from > UDFContext > at > org.apache.pig.scripting.js.JsScriptEngine$Holder.<clinit>(JsScriptEngine.java:69) > ... 91 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)