[ https://issues.apache.org/jira/browse/PIG-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450004#comment-15450004 ]
Rohini Palaniswamy commented on PIG-4920: ----------------------------------------- Liyun, This approach is not going to work for following reasons - We should not do any if(mr/tez/spark) conditions in main code. Only in test cases, we do that. When we move to maven (hopefully that will happen sometime) spark code will be in its own module and SparkExecType will not be something available to pig-core module. - PigContext is very heavy and serializing that costs a lot in terms of performance. PigContext is also actually not necessary in the backend processing. And so you should avoid serializing that in the first place which is what PIG-4866 does. The current patch actually serializes the udfcontext and the client properties as part of PigContext which are already part of the object doubling the size making it worse. You should be doing MapRedUtil.setupUDFContext(jobConf); as the first thing in all threads used for execution which is what MR and Tez does. I wish we could get rid of this whole ThreadLocal business as setting up it is very messy in general, but that is required for local mode processing. > Fail to use Javascript UDF in spark yarn client mode > ---------------------------------------------------- > > Key: PIG-4920 > URL: https://issues.apache.org/jira/browse/PIG-4920 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: liyunzhang_intel > Fix For: spark-branch > > Attachments: PIG-4920.patch, PIG-4920_2.patch, PIG-4920_3.patch > > > udf.pig > {code} > register '/home/zly/prj/oss/merge.pig/pig/bin/udf.js' using javascript as > myfuncs; > A = load './passwd' as (a0:chararray, a1:chararray); > B = foreach A generate myfuncs.helloworld(); > store B into './udf.out'; > {code} > udf.js > {code} > helloworld.outputSchema = "word:chararray"; > function helloworld() { > return 'Hello, World'; > } > > complex.outputSchema = "word:chararray"; > function complex(word){ > return {word:word}; > } > {code} > run udf.pig in spark local mode(export SPARK_MASTER="local"), it successfully. > run udf.pig in spark yarn client mode(export SPARK_MASTER="yarn-client"), it > fails and error message like following: > {noformat} > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:408) > at > org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:744) > ... 84 more > Caused by: java.lang.ExceptionInInitializerError > at > org.apache.pig.scripting.js.JsScriptEngine.getInstance(JsScriptEngine.java:87) > at org.apache.pig.scripting.js.JsFunction.<init>(JsFunction.java:173) > ... 89 more > Caused by: java.lang.IllegalStateException: could not get script path from > UDFContext > at > org.apache.pig.scripting.js.JsScriptEngine$Holder.<clinit>(JsScriptEngine.java:69) > ... 91 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)