[ 
https://issues.apache.org/jira/browse/PIG-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436324#comment-15436324
 ] 

liyunzhang_intel commented on PIG-4920:
---------------------------------------

[~rohini]:
  As you are reviewing Pig on Spark, here say more about PIG-4611, 
PIG-4265(Both are caused by the same problem), now when I fix PIG-4920, I found 
same problem. So here explain more to help review: 
Because in spark, the initialization of UDFContext#udfConfs  is later than 
deserialization of all objects while it is different in mr /tez mode.
Let ‘s use TestHBaseStorage#testLoadWithProjection_1 to explain more:
HBaseStorage#defaultCaster is set in the constructor of HBaseStorage, its value 
from UDFContext.getUDFContext.getClientSystemProps().
{code}
public HBaseStorage(String columnList, String optString) throws ParseException, 
IOException {
….
  String defaultCaster = 
UDFContext.getUDFContext().getClientSystemProps().getProperty(CASTER_PROPERTY, 
STRING_CASTER);
 
}
{code}
In mr mode, we initialize UDFContext(PigGenericMapBase#setup  -> 
MapRedUtil.setupUDFContext -> UDFContext#deserialize) first then call 
HBaseStorage#constructor.
In spark mode, Spark will deserialize all object first then starting executor 
to execute the program. HBaseStorage#constructor first then initialize 
UDFContext(PigInputFormatSpark#createRecordReader->MapRedUtil#setupUDFContext-> 
UDFContext#deserialize), so NPE is thrown out in this situation(so in PIG-4611, 
I verify whether 
UDFContext.getUDFContext().getClientSystemProps().getProperty(CASTER_PROPERTY, 
STRING_CASTER) is null or not).  
The solution is to serialize and deserialize UDFContext#udfConfs and 
UDFContext#clientSystemProperties when program serialize PigContext in 
PigContext#writeObject and deserialize PigContext in PigContext#readObject(see 
PIG-4920_3.patch).  After *PIG-4866*, we don’t serialize PigContext in 
configuration to be backend. But PIG-4866 and this solution is *not conflict*,  
only PigContext#exec_type, PigContext#packageImportList, UDFContext#udfConfs, 
UDFContext#clientSystemProps these 4 variables are serialized and deserialized 
in spark mode. And I think PigContext#readObject and PigContext#writeObject 
will be only used in spark mode as we did not serialize and deserialize 
PigContext anymore in mr/tez mode.



> Fail to use Javascript UDF in spark yarn client mode
> ----------------------------------------------------
>
>                 Key: PIG-4920
>                 URL: https://issues.apache.org/jira/browse/PIG-4920
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4920.patch, PIG-4920_2.patch, PIG-4920_3.patch
>
>
> udf.pig 
> {code}
> register '/home/zly/prj/oss/merge.pig/pig/bin/udf.js' using javascript as 
> myfuncs;
> A = load './passwd' as (a0:chararray, a1:chararray);
> B = foreach A generate myfuncs.helloworld();
> store B into './udf.out';
> {code}
> udf.js
> {code}
> helloworld.outputSchema = "word:chararray";
> function helloworld() {
>     return 'Hello, World';
> }
>     
> complex.outputSchema = "word:chararray";
> function complex(word){
>     return {word:word};
> }
> {code}
> run udf.pig in spark local mode(export SPARK_MASTER="local"), it successfully.
> run udf.pig in spark yarn client mode(export SPARK_MASTER="yarn-client"), it 
> fails and error message like following:
> {noformat}
> Caused by: java.lang.reflect.InvocationTargetException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
>         at 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:744)
>         ... 84 more
> Caused by: java.lang.ExceptionInInitializerError
>         at 
> org.apache.pig.scripting.js.JsScriptEngine.getInstance(JsScriptEngine.java:87)
>         at org.apache.pig.scripting.js.JsFunction.<init>(JsFunction.java:173)
>         ... 89 more
> Caused by: java.lang.IllegalStateException: could not get script path from 
> UDFContext
>         at 
> org.apache.pig.scripting.js.JsScriptEngine$Holder.<clinit>(JsScriptEngine.java:69)
>         ... 91 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to