[ 
https://issues.apache.org/jira/browse/PIG-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615915#comment-14615915
 ] 

Mohit Sabharwal commented on PIG-4611:
--------------------------------------

Thanks for the explanation and addressing this issue, [~kellyzly]!!!

Let me know if I understand this correctly:

1) Spark Executor will serialize all objects referenced in supplied closures. 
Since UDFContext.getUDFContext() is not initialized (because Spark does not 
expose a setup() interface like MR), we always default defaultCaster to 
STRING_CASTER.

2) However later on, in the *same* Executor thread,  the record reader creation 
will correctly deserialize the UDFContext from JobConf 
(PigInputFormatSpark.createRecordReader->PigInputFormat.createRecordReader->MapRedUtil.setupUDFContext->UDFContext.deserialize)

3) Next, in the same Executor thread, when HBaseStorage is initialized by the 
load function, it will find a correctly populated UDFContext.

This sounds reasonable to me. Since this a core change, could you please add 
comments to HBaseStorage.java explaining why we handling this as a special case 
for Spark ?


I assume it is a typo, but you need -Dexectype argument to be {{spark}}, not 
{{TestHBaseStorage}} when running TestHBaseStorage:
{code}
ant test -Dhadoopversion=23 -Dtestcase=TestHBaseStorage -Dexectype=spark 
-DdebugPort=9999
{code}

> Fix remaining unit test failures about "TestHBaseStorage"
> ---------------------------------------------------------
>
>                 Key: PIG-4611
>                 URL: https://issues.apache.org/jira/browse/PIG-4611
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4611.patch
>
>
> In https://builds.apache.org/job/Pig-spark/lastCompletedBuild/testReport/, it 
> shows following unit test failures about TestHBaseStorage:
>  org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete  
>  org.apache.pig.test.TestHBaseStorage.testLoadWithProjection_1
>  org.apache.pig.test.TestHBaseStorage.testLoadWithProjection_2        
>  org.apache.pig.test.TestHBaseStorage.testStoreToHBase_2_with_projection
>  org.apache.pig.test.TestHBaseStorage.testCollectedGroup      
>  org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to