[ 
https://issues.apache.org/jira/browse/HIVE-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088162#comment-14088162
 ] 

Chao commented on HIVE-7569:
----------------------------

(Not sure it's related)
Sometimes when I run a multi-insertion job in Spark, I got exception like 
following.
If I ran the SAME query in MR mode AND THEN in Spark mode, the query will 
succeed and produce correct result.

{{code}}
2014-08-06 12:58:53,168 INFO  [Executor task launch worker-0]: 
exec.GroupByOperator (Operator.java:initialize(389)) - Initialization Done 35 
GBY
2014-08-06 12:58:53,169 ERROR [Executor task launch worker-0]: ExecReducer 
(ExecReducer.java:reduce(272)) - 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to 
deserialize reduce input key from 
x1x1x1x98x98x98x98x98x98x98x98x98x98x98x98x98x98x98x98x0x0x255 with properties 
{columns=_col0, 
serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
 serialization.sort.order=+, columns.types=map<string,string>}
        at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:212)
        at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:60)
        at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
        at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161)
        at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161)
        at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
        at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
        at org.apache.spark.scheduler.Task.run(Task.scala:51)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException
        at 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:191)
        at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:210)
        ... 15 more
Caused by: java.io.EOFException
        at 
org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54)
        at 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:201)
        at 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:491)
        at 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:187)
        ... 16 more
{{code}}



> Make sure multi-MR queries work
> -------------------------------
>
>                 Key: HIVE-7569
>                 URL: https://issues.apache.org/jira/browse/HIVE-7569
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Chao
>
> With the latest dev effort, queries that would involve multiple MR jobs 
> should be supported by spark now, except for sorting, multi-insert, union, 
> and join (map join and smb might just work). However, this hasn't be verified 
> and tested. This task is to ensure this is the case. Please create JIRAs for 
> problems found.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to