lihuahui5683 opened a new issue, #5382:
URL: https://github.com/apache/hudi/issues/5382
**Describe the problem you faced**
The following exception occurs when hive incremental query hudi xxx_rt :
```
22/04/21 10:43:58 INFO scheduler.TaskSetManager: Lost task 0.0 in stage 1.0
(TID 8) on Hadoop02, executor 7: java.lang.ClassCastException
(org.apache.hudi.hadoop.hive.HoodieCombineRealtimeFileSplit cannot be cast to
org.apache.hadoop.hive.shims.HadoopShimsSecure$InputSplitShim) [duplicate 3]
22/04/21 10:43:58 INFO cluster.YarnClusterScheduler: Removed TaskSet 1.0,
whose tasks have all completed, from pool
22/04/21 10:43:58 ERROR client.RemoteDriver: Failed to run client job
60806c1e-f2b0-4ee5-bbab-46f8238f3493
java.util.concurrent.ExecutionException: Exception thrown by job
at
org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:337)
at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:342)
at
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:404)
at
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:365)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage
failure:
Aborting TaskSet 1.0 because task 2 (partition 2)
cannot run anywhere due to node and executor blacklist.
Most recent failure:
Lost task 2.0 in stage 1.0 (TID 10, Hadoop02, executor 7):
java.lang.ClassCastException:
org.apache.hudi.hadoop.hive.HoodieCombineRealtimeFileSplit cannot be cast to
org.apache.hadoop.hive.shims.HadoopShimsSecure$InputSplitShim
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:205)
at
org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.getRecordReader(HoodieCombineHiveInputFormat.java:979)
at
org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getRecordReader(HoodieCombineHiveInputFormat.java:556)
at
org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:272)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:271)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:225)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:96)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```
I also set the following parameters:
```
add jar hdfs://mycluster/hudi/jars/hudi-hadoop-mr-bundle-0.10.0.jar;
set hive.input.format =
org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat;
set hoodie.role_sync_hive.consume.mode=INCREMENTAL;
set hoodie.role_sync_hive.consume.max.commits=3;
set mapreduce.input.fileinputformat.split.maxsize=128;
set hive.fetch.task.conversion=none;
set hoodie.role_sync_hive.consume.start.timestamp=20220420143200507;
```
The query statement is as follows:
```
select * from role_sync_hive_rt where `_hoodie_commit_time` >
'20220420143200507';
```
**Environment Description**
* Hudi version : 0.10.0
* Spark version : 2.4.0_cdh6.3.2
* Hive version : 2.1.1_cdh6.3.2
* Hadoop version : 3.0.0_cdh6.3.2
* Storage (HDFS/S3/GCS..) : HDFS
* Running on Docker? (yes/no) : no
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]