[ 
https://issues.apache.org/jira/browse/PIG-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960207#comment-15960207
 ] 

liyunzhang_intel commented on PIG-5134:
---------------------------------------

[~nkollar]:  here is my understanding for the jira.
You provide 2 options to solve the issue
1. PIG-5134.patch:use Kryo 4.0.0 serialization. but if use PIG-5134.patch,cases 
with HiveUDF and ORC will fail because the kryo version need to be 2.22
2. PIG-5134.1.patch: implemented readObject and writeObject methods in 
AvroTupleWrapper and not to use Kryo. But customized Wrapper class like 
AvroTupleWrapper will still be broken.

If my understanding is right, I suggest to exclude TestAvroStorage from 
unittest and not fixed in first release of pig on spark. [~rohini], can you 
give us some suggestion? 

> Fix TestAvroStorage unit test in Spark mode
> -------------------------------------------
>
>                 Key: PIG-5134
>                 URL: https://issues.apache.org/jira/browse/PIG-5134
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: Nandor Kollar
>             Fix For: spark-branch
>
>         Attachments: PIG-5134_2.patch, PIG-5134.patch
>
>
> It seems that test fails, because Avro GenericData#Record doesn't implement 
> Serializable interface:
> {code}
> 2017-02-23 09:14:41,887 ERROR [main] spark.JobGraphBuilder 
> (JobGraphBuilder.java:sparkOperToRDD(183)) - throw exception in 
> sparkOperToRDD: 
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 
> in stage 9.0 (TID 9) had a not serializable result: 
> org.apache.avro.generic.GenericData$Record
> Serialization stack:
>       - object not serializable (class: 
> org.apache.avro.generic.GenericData$Record, value: {"key": "stuff in closet", 
> "value1": {"thing": "hat", "count": 7}, "value2": {"thing": "coat", "count": 
> 2}})
>       - field (class: org.apache.pig.impl.util.avro.AvroTupleWrapper, name: 
> avroObject, type: interface org.apache.avro.generic.IndexedRecord)
>       - object (class org.apache.pig.impl.util.avro.AvroTupleWrapper, 
> org.apache.pig.impl.util.avro.AvroTupleWrapper@3d3a58c1)
>       at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>       at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
> {code}
> The failing tests is a new test introduced with merging trunk to spark 
> branch, that's why we didn't see this error before.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to