[ 
https://issues.apache.org/jira/browse/PIG-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450004#comment-15450004
 ] 

Rohini Palaniswamy commented on PIG-4920:
-----------------------------------------

Liyun,
     This approach is not going to work for following reasons
   - We should not do any if(mr/tez/spark) conditions in main code. Only in 
test cases, we do that. When we move to maven (hopefully that will happen 
sometime) spark code will be in its own module and SparkExecType will not be 
something available to pig-core module.
   - PigContext is very heavy and serializing that costs a lot in terms of 
performance. PigContext is also actually not necessary in the backend 
processing. And so you should avoid serializing that in the first place which 
is what PIG-4866 does. The current patch actually serializes the udfcontext and 
the client properties as part of PigContext which are already part of the 
object doubling the size making it worse.

You should be doing MapRedUtil.setupUDFContext(jobConf); as the first thing in 
all threads used for execution which is what MR and Tez does. I wish we could 
get rid of this whole ThreadLocal business as setting up it is very messy in 
general, but that is required for local mode processing.




> Fail to use Javascript UDF in spark yarn client mode
> ----------------------------------------------------
>
>                 Key: PIG-4920
>                 URL: https://issues.apache.org/jira/browse/PIG-4920
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4920.patch, PIG-4920_2.patch, PIG-4920_3.patch
>
>
> udf.pig 
> {code}
> register '/home/zly/prj/oss/merge.pig/pig/bin/udf.js' using javascript as 
> myfuncs;
> A = load './passwd' as (a0:chararray, a1:chararray);
> B = foreach A generate myfuncs.helloworld();
> store B into './udf.out';
> {code}
> udf.js
> {code}
> helloworld.outputSchema = "word:chararray";
> function helloworld() {
>     return 'Hello, World';
> }
>     
> complex.outputSchema = "word:chararray";
> function complex(word){
>     return {word:word};
> }
> {code}
> run udf.pig in spark local mode(export SPARK_MASTER="local"), it successfully.
> run udf.pig in spark yarn client mode(export SPARK_MASTER="yarn-client"), it 
> fails and error message like following:
> {noformat}
> Caused by: java.lang.reflect.InvocationTargetException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
>         at 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:744)
>         ... 84 more
> Caused by: java.lang.ExceptionInInitializerError
>         at 
> org.apache.pig.scripting.js.JsScriptEngine.getInstance(JsScriptEngine.java:87)
>         at org.apache.pig.scripting.js.JsFunction.<init>(JsFunction.java:173)
>         ... 89 more
> Caused by: java.lang.IllegalStateException: could not get script path from 
> UDFContext
>         at 
> org.apache.pig.scripting.js.JsScriptEngine$Holder.<clinit>(JsScriptEngine.java:69)
>         ... 91 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to