[
https://issues.apache.org/jira/browse/PIG-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436324#comment-15436324
]
liyunzhang_intel commented on PIG-4920:
---------------------------------------
[~rohini]:
As you are reviewing Pig on Spark, here say more about PIG-4611,
PIG-4265(Both are caused by the same problem), now when I fix PIG-4920, I found
same problem. So here explain more to help review:
Because in spark, the initialization of UDFContext#udfConfs is later than
deserialization of all objects while it is different in mr /tez mode.
Let ‘s use TestHBaseStorage#testLoadWithProjection_1 to explain more:
HBaseStorage#defaultCaster is set in the constructor of HBaseStorage, its value
from UDFContext.getUDFContext.getClientSystemProps().
{code}
public HBaseStorage(String columnList, String optString) throws ParseException,
IOException {
….
String defaultCaster =
UDFContext.getUDFContext().getClientSystemProps().getProperty(CASTER_PROPERTY,
STRING_CASTER);
}
{code}
In mr mode, we initialize UDFContext(PigGenericMapBase#setup ->
MapRedUtil.setupUDFContext -> UDFContext#deserialize) first then call
HBaseStorage#constructor.
In spark mode, Spark will deserialize all object first then starting executor
to execute the program. HBaseStorage#constructor first then initialize
UDFContext(PigInputFormatSpark#createRecordReader->MapRedUtil#setupUDFContext->
UDFContext#deserialize), so NPE is thrown out in this situation(so in PIG-4611,
I verify whether
UDFContext.getUDFContext().getClientSystemProps().getProperty(CASTER_PROPERTY,
STRING_CASTER) is null or not).
The solution is to serialize and deserialize UDFContext#udfConfs and
UDFContext#clientSystemProperties when program serialize PigContext in
PigContext#writeObject and deserialize PigContext in PigContext#readObject(see
PIG-4920_3.patch). After *PIG-4866*, we don’t serialize PigContext in
configuration to be backend. But PIG-4866 and this solution is *not conflict*,
only PigContext#exec_type, PigContext#packageImportList, UDFContext#udfConfs,
UDFContext#clientSystemProps these 4 variables are serialized and deserialized
in spark mode. And I think PigContext#readObject and PigContext#writeObject
will be only used in spark mode as we did not serialize and deserialize
PigContext anymore in mr/tez mode.
> Fail to use Javascript UDF in spark yarn client mode
> ----------------------------------------------------
>
> Key: PIG-4920
> URL: https://issues.apache.org/jira/browse/PIG-4920
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4920.patch, PIG-4920_2.patch, PIG-4920_3.patch
>
>
> udf.pig
> {code}
> register '/home/zly/prj/oss/merge.pig/pig/bin/udf.js' using javascript as
> myfuncs;
> A = load './passwd' as (a0:chararray, a1:chararray);
> B = foreach A generate myfuncs.helloworld();
> store B into './udf.out';
> {code}
> udf.js
> {code}
> helloworld.outputSchema = "word:chararray";
> function helloworld() {
> return 'Hello, World';
> }
>
> complex.outputSchema = "word:chararray";
> function complex(word){
> return {word:word};
> }
> {code}
> run udf.pig in spark local mode(export SPARK_MASTER="local"), it successfully.
> run udf.pig in spark yarn client mode(export SPARK_MASTER="yarn-client"), it
> fails and error message like following:
> {noformat}
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
> at
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:744)
> ... 84 more
> Caused by: java.lang.ExceptionInInitializerError
> at
> org.apache.pig.scripting.js.JsScriptEngine.getInstance(JsScriptEngine.java:87)
> at org.apache.pig.scripting.js.JsFunction.<init>(JsFunction.java:173)
> ... 89 more
> Caused by: java.lang.IllegalStateException: could not get script path from
> UDFContext
> at
> org.apache.pig.scripting.js.JsScriptEngine$Holder.<clinit>(JsScriptEngine.java:69)
> ... 91 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)