[
https://issues.apache.org/jira/browse/PIG-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liyunzhang_intel updated PIG-4920:
----------------------------------
Attachment: PIG-4920.patch
[~mohitsabharwal], please help review:
UDFContext.getUDFContext() returns UDFContext#tss which is a ThreadLocal
variable. ThreadLocal variable can not be serialized and deserialized and its
value is different in different threads. So an
[exception|https://github.com/apache/pig/blob/spark/src/org/apache/pig/scripting/js/JsScriptEngine.java#L66]
is thrown when
UDFContext.getUDFContext().getUDFProperties(JsFunction.class).get(JsScriptEngine.class.getName()+".scriptFile")
is called in spark executor thread. The reason why the exception throws out in
spark while not in mr mode is because deserialization of all objects is earlier
than the initialize of UDFContext(UDFContext#deserialize).
In mr: PigGenericMapBase#setup ->MapRedUtil#setupUDFContext ->
UDFContext#deserialize -> JsScriptEngine.Holder
In spark: JsScriptEngine.Holder -> PigInputFormat#createRecordReader
->MapRedUtil#setupUDFContext -> UDFContext#deserialize
Changes in the patch(this method is like what we did in PIG-4295)
1. Serialize UDFContext#udfConfs and UDFContext#clientSysProps in
UDFContext#serializeUDFContextInPigContext
2. Deserialize UDFContext#udfConfs and UDFContext#clientSysProps
UDFContext#deserializeFromPigContext
3. UDFContext#serializeUDFContextInPigContext is called in
SparkUtil#newJobConf
4. UDFContext#deserializeFromPigContext is called in
PigContext#readObject
> Fail to use Javascript UDF in spark yarn client mode
> ----------------------------------------------------
>
> Key: PIG-4920
> URL: https://issues.apache.org/jira/browse/PIG-4920
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4920.patch
>
>
> udf.pig
> {code}
> register '/home/zly/prj/oss/merge.pig/pig/bin/udf.js' using javascript as
> myfuncs;
> A = load './passwd' as (a0:chararray, a1:chararray);
> B = foreach A generate myfuncs.helloworld();
> store B into './udf.out';
> {code}
> udf.js
> {code}
> helloworld.outputSchema = "word:chararray";
> function helloworld() {
> return 'Hello, World';
> }
>
> complex.outputSchema = "word:chararray";
> function complex(word){
> return {word:word};
> }
> {code}
> run udf.pig in spark local mode(export SPARK_MASTER="local"), it successfully.
> run udf.pig in spark yarn client mode(export SPARK_MASTER="yarn-client"), it
> fails and error message like following:
> {noformat}
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
> at
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:744)
> ... 84 more
> Caused by: java.lang.ExceptionInInitializerError
> at
> org.apache.pig.scripting.js.JsScriptEngine.getInstance(JsScriptEngine.java:87)
> at org.apache.pig.scripting.js.JsFunction.<init>(JsFunction.java:173)
> ... 89 more
> Caused by: java.lang.IllegalStateException: could not get script path from
> UDFContext
> at
> org.apache.pig.scripting.js.JsScriptEngine$Holder.<clinit>(JsScriptEngine.java:69)
> ... 91 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)