[ https://issues.apache.org/jira/browse/HIVE-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088162#comment-14088162 ]
Chao commented on HIVE-7569: ---------------------------- (Not sure it's related) Sometimes when I run a multi-insertion job in Spark, I got exception like following. If I ran the SAME query in MR mode AND THEN in Spark mode, the query will succeed and produce correct result. {{code}} 2014-08-06 12:58:53,168 INFO [Executor task launch worker-0]: exec.GroupByOperator (Operator.java:initialize(389)) - Initialization Done 35 GBY 2014-08-06 12:58:53,169 ERROR [Executor task launch worker-0]: ExecReducer (ExecReducer.java:reduce(272)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from x1x1x1x98x98x98x98x98x98x98x98x98x98x98x98x98x98x98x98x0x0x255 with properties {columns=_col0, serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, serialization.sort.order=+, columns.types=map<string,string>} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:212) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:60) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161) at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559) at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:191) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:210) ... 15 more Caused by: java.io.EOFException at org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54) at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:201) at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:491) at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:187) ... 16 more {{code}} > Make sure multi-MR queries work > ------------------------------- > > Key: HIVE-7569 > URL: https://issues.apache.org/jira/browse/HIVE-7569 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Xuefu Zhang > Assignee: Chao > > With the latest dev effort, queries that would involve multiple MR jobs > should be supported by spark now, except for sorting, multi-insert, union, > and join (map join and smb might just work). However, this hasn't be verified > and tested. This task is to ensure this is the case. Please create JIRAs for > problems found. -- This message was sent by Atlassian JIRA (v6.2#6252)