maheshguptags opened a new issue, #10609:
URL: https://github.com/apache/hudi/issues/10609
I am trying to ingest the data using spark+kafka streaming to hudi table
with the RLI index. but unfortunately ingesting 5-10 records is throwing the
below issue.
Steps to reproduce the behavior:
1. first build dependency for hudi 14 and spark 3.4
2. add hudi RLI index
**Expected behavior**
it should work end to end with RLI index enable
**Environment Description**
* Hudi version : 14
* Spark version : 3.4.0
* Hive version : NA
* Hadoop version : 3.3.4
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) : Yes
**Additional context**
Hudi Configuration:
val hudiOptions = Map(
"hoodie.table.name" -> "customer_profile",
"hoodie.datasource.write.recordkey.field" -> "x,y",
"hoodie.datasource.write.partitionpath.field" -> "x",
"hoodie.datasource.write.precombine.field" -> "ts",
"hoodie.table.type" -> "COPY_ON_WRITE",
"hoodie.clean.max.commits" -> "6",
"hoodie.clean.trigger.strategy" -> "NUM_COMMITS",
"hoodie.cleaner.commits.retained" -> "4",
"hoodie.cleaner.parallelism" -> "50",
"hoodie.clean.automatic" -> "true",
"hoodie.clean.async" -> "true",
"hoodie.parquet.compression.codec" -> "snappy",
"hoodie.index.type" -> "RECORD_INDEX",
"hoodie.metadata.record.index.enable" -> "true",
"hoodie.metadata.record.index.min.filegroup.count " -> "20", # in trial
"hoodie.metadata.record.index.max.filegroup.count" -> "5000" )
**Stacktrace**
```
24/02/02 13:51:46 INFO BlockManagerInfo: Added broadcast_86_piece0 in memory
on 10.224.52.183:42743 (size: 161.7 KiB, free: 413.7 MiB)
24/02/02 13:51:46 INFO BlockManagerInfo: Added broadcast_86_piece0 in memory
on 10.224.50.139:39367 (size: 161.7 KiB, free: 413.7 MiB)
24/02/02 13:51:46 INFO MapOutputTrackerMasterEndpoint: Asked to send map
output locations for shuffle 40 to 10.224.50.139:55724
24/02/02 13:51:46 INFO MapOutputTrackerMasterEndpoint: Asked to send map
output locations for shuffle 40 to 10.224.52.183:34940
24/02/02 13:51:46 INFO MapOutputTrackerMasterEndpoint: Asked to send map
output locations for shuffle 40 to 10.224.50.159:33310
24/02/02 13:51:47 INFO TaskSetManager: Starting task 9.0 in stage 148.0 (TID
553) (10.224.53.172, executor 3, partition 9, NODE_LOCAL, 7189 bytes)
24/02/02 13:51:47 WARN TaskSetManager: Lost task 1.0 in stage 148.0 (TID
545) (10.224.53.172 executor 3): org.apache.hudi.exception.HoodieException:
org.apache.hudi.exception.HoodieException: Error occurs when executing map
at
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
Source)
at
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
Source)
at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
at
java.base/java.util.concurrent.ForkJoinTask.getThrowableException(Unknown
Source)
at java.base/java.util.concurrent.ForkJoinTask.reportException(Unknown
Source)
at java.base/java.util.concurrent.ForkJoinTask.invoke(Unknown Source)
at
java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.base/java.util.stream.ReferencePipeline.collect(Unknown Source)
at
org.apache.hudi.common.engine.HoodieLocalEngineContext.map(HoodieLocalEngineContext.java:84)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:270)
at
org.apache.hudi.metadata.BaseTableMetadata.readRecordIndex(BaseTableMetadata.java:296)
at
org.apache.hudi.index.SparkMetadataTableRecordIndex$RecordIndexFileGroupLookupFunction.call(SparkMetadataTableRecordIndex.java:170)
at
org.apache.hudi.index.SparkMetadataTableRecordIndex$RecordIndexFileGroupLookupFunction.call(SparkMetadataTableRecordIndex.java:157)
at
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsToPair$1(JavaRDDLike.scala:186)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:853)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:853)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.apache.hudi.exception.HoodieException: Error occurs when
executing map
at
org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:40)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown
Source)
at
java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown
Source)
at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown
Source)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown
Source)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown
Source)
at java.base/java.util.stream.AbstractTask.compute(Unknown Source)
at java.base/java.util.concurrent.CountedCompleter.exec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
at
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown
Source)
at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown
Source)
Caused by: org.apache.hudi.exception.HoodieException: Exception when reading
log file
at
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternalV1(AbstractHoodieLogRecordReader.java:414)
at
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:220)
at
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.scanByFullKeys(HoodieMergedLogRecordScanner.java:160)
at
org.apache.hudi.metadata.HoodieMetadataLogRecordReader.getRecordsByKeys(HoodieMetadataLogRecordReader.java:108)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.readLogRecords(HoodieBackedTableMetadata.java:327)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupKeysFromFileSlice(HoodieBackedTableMetadata.java:304)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$f9381e22$1(HoodieBackedTableMetadata.java:275)
at
org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:38)
... 13 more
Caused by: java.lang.ClassCastException: class
org.apache.avro.generic.GenericData$Record cannot be cast to class
org.apache.hudi.avro.model.HoodieDeleteRecordList
(org.apache.avro.generic.GenericData$Record is in unnamed module of loader
'app'; org.apache.hudi.avro.model.HoodieDeleteRecordList is in unnamed module
of loader org.apache.spark.util.MutableURLClassLoader @727177d3)
at
org.apache.hudi.common.table.log.block.HoodieDeleteBlock.deserialize(HoodieDeleteBlock.java:160)
at
org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getRecordsToDelete(HoodieDeleteBlock.java:115)
at
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:828)
at
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternalV1(AbstractHoodieLogRecordReader.java:403)
... 20 more
24/02/02 13:51:47 INFO TaskSetManager: Starting task 7.0 in stage 148.0 (TID
554) (10.224.51.194, executor 4, partition 7, NODE_LOCAL, 7189 bytes)
24/02/02 13:51:47 INFO TaskSetManager: Finished task 3.0 in stage 148.0 (TID
547) in 587 ms on 10.224.51.194 (executor 4) (1/10)
24/02/02 13:51:47 INFO TaskSetManager: Finished task 2.0 in stage 148.0 (TID
546) in 684 ms on 10.224.52.197 (executor 2) (2/10)
24/02/02 13:51:47 INFO TaskSetManager: Starting task 1.1 in stage 148.0 (TID
555) (10.224.50.139, executor 1, partition 1, NODE_LOCAL, 7189 bytes)
24/02/02 13:51:47 INFO TaskSetManager: Finished task 8.0 in stage 148.0 (TID
552) in 596 ms on 10.224.50.139 (executor 1) (3/10)
24/02/02 13:51:47 INFO TaskSetManager: Finished task 0.0 in stage 148.0 (TID
550) in 690 ms on 10.224.50.159 (executor 6) (4/10)
24/02/02 13:51:47 INFO TaskSetManager: Finished task 4.0 in stage 148.0 (TID
548) in 789 ms on 10.224.53.213 (executor 5) (5/10)
24/02/02 13:51:47 INFO TaskSetManager: Finished task 6.0 in stage 148.0 (TID
551) in 799 ms on 10.224.52.183 (executor 7) (6/10)
24/02/02 13:51:47 WARN TaskSetManager: Lost task 9.0 in stage 148.0 (TID
553) (10.224.53.172 executor 3): org.apache.hudi.exception.HoodieException:
Error occurs when executing map
at
org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:40)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown
Source)
at
java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown
Source)
at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown
Source)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown
Source)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown
Source)
at java.base/java.util.stream.AbstractTask.compute(Unknown Source)
at java.base/java.util.concurrent.CountedCompleter.exec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.helpCC(Unknown
Source)
at
java.base/java.util.concurrent.ForkJoinPool.externalHelpComplete(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.tryExternalHelp(Unknown
Source)
at
java.base/java.util.concurrent.ForkJoinTask.externalAwaitDone(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.doInvoke(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.invoke(Unknown Source)
at
java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.base/java.util.stream.ReferencePipeline.collect(Unknown Source)
at
org.apache.hudi.common.engine.HoodieLocalEngineContext.map(HoodieLocalEngineContext.java:84)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:270)
at
org.apache.hudi.metadata.BaseTableMetadata.readRecordIndex(BaseTableMetadata.java:296)
at
org.apache.hudi.index.SparkMetadataTableRecordIndex$RecordIndexFileGroupLookupFunction.call(SparkMetadataTableRecordIndex.java:170)
at
org.apache.hudi.index.SparkMetadataTableRecordIndex$RecordIndexFileGroupLookupFunction.call(SparkMetadataTableRecordIndex.java:157)
at
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsToPair$1(JavaRDDLike.scala:186)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:853)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:853)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.apache.hudi.exception.HoodieException: Exception when reading
log file
at
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternalV1(AbstractHoodieLogRecordReader.java:414)
at
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:220)
at
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.scanByFullKeys(HoodieMergedLogRecordScanner.java:160)
at
org.apache.hudi.metadata.HoodieMetadataLogRecordReader.getRecordsByKeys(HoodieMetadataLogRecordReader.java:108)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.readLogRecords(HoodieBackedTableMetadata.java:327)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupKeysFromFileSlice(HoodieBackedTableMetadata.java:304)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$f9381e22$1(HoodieBackedTableMetadata.java:275)
at
org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:38)
... 40 more
Caused by: java.lang.ClassCastException: class
org.apache.avro.generic.GenericData$Record cannot be cast to class
org.apache.hudi.avro.model.HoodieDeleteRecordList
(org.apache.avro.generic.GenericData$Record is in unnamed module of loader
'app'; org.apache.hudi.avro.model.HoodieDeleteRecordList is in unnamed module
of loader org.apache.spark.util.MutableURLClassLoader @727177d3)
at
org.apache.hudi.common.table.log.block.HoodieDeleteBlock.deserialize(HoodieDeleteBlock.java:160)
at
org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getRecordsToDelete(HoodieDeleteBlock.java:115)
at
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:828)
at
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternalV1(AbstractHoodieLogRecordReader.java:403)
... 47 more
24/02/02 13:51:47 INFO TaskSetManager: Starting task 9.1 in stage 148.0 (TID
556) (10.224.50.159, executor 6, partition 9, NODE_LOCAL, 7189 bytes)
24/02/02 13:51:47 INFO TaskSetManager: Finished task 7.0 in stage 148.0 (TID
554) in 414 ms on 10.224.51.194 (executor 4) (7/10)
24/02/02 13:51:47 INFO TaskSetManager: Finished task 1.1 in stage 148.0 (TID
555) in 512 ms on 10.224.50.139 (executor 1) (8/10)
24/02/02 13:51:47 INFO TaskSetManager: Finished task 9.1 in stage 148.0 (TID
556) in 403 ms on 10.224.50.159 (executor 6) (9/10)
24/02/02 13:51:48 INFO TaskSetManager: Finished task 5.0 in stage 148.0 (TID
549) in 1906 ms on 10.224.52.235 (executor 8) (10/10)
... 36 more
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720)
at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1206)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1206)
at scala.Option.foreach(Option.scala:407)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1206)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2984)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:971)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2263)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2284)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2303)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2328)
at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1019)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
at org.apache.spark.rdd.RDD.collect(RDD.scala:1018)
at
org.apache.spark.rdd.PairRDDFunctions.$anonfun$countByKey$1(PairRDDFunctions.scala:367)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
at
org.apache.spark.rdd.PairRDDFunctions.countByKey(PairRDDFunctions.scala:367)
at
org.apache.spark.api.java.JavaPairRDD.countByKey(JavaPairRDD.scala:314)
at
org.apache.hudi.data.HoodieJavaPairRDD.countByKey(HoodieJavaPairRDD.java:105)
at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.buildProfile(BaseSparkCommitActionExecutor.java:200)
at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:174)
at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:86)
at
org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:67)
... 36 more
Caused by: org.apache.hudi.exception.HoodieException: Error occurs when
executing map
at
org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:40)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown
Source)
at
java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown
Source)
at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown
Source)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown
Source)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown
Source)
at java.base/java.util.stream.AbstractTask.compute(Unknown Source)
at java.base/java.util.concurrent.CountedCompleter.exec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.doInvoke(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.invoke(Unknown Source)
at
java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.base/java.util.stream.ReferencePipeline.collect(Unknown Source)
at
org.apache.hudi.common.engine.HoodieLocalEngineContext.map(HoodieLocalEngineContext.java:84)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:270)
at
org.apache.hudi.metadata.BaseTableMetadata.readRecordIndex(BaseTableMetadata.java:296)
at
org.apache.hudi.index.SparkMetadataTableRecordIndex$RecordIndexFileGroupLookupFunction.call(SparkMetadataTableRecordIndex.java:170)
at
org.apache.hudi.index.SparkMetadataTableRecordIndex$RecordIndexFileGroupLookupFunction.call(SparkMetadataTableRecordIndex.java:157)
at
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsToPair$1(JavaRDDLike.scala:186)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:853)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:853)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassCastException: class
org.apache.avro.generic.GenericData$Record cannot be cast to class
org.apache.avro.specific.SpecificRecordBase
(org.apache.avro.generic.GenericData$Record and
org.apache.avro.specific.SpecificRecordBase are in unnamed module of loader
'app')
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:209)
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieRollbackMetadata(TimelineMetadataUtils.java:177)
at
org.apache.hudi.metadata.HoodieTableMetadataUtil.getRollbackedCommits(HoodieTableMetadataUtil.java:1355)
at
org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$getValidInstantTimestamps$37(HoodieTableMetadataUtil.java:1284)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(Unknown
Source)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(Unknown
Source)
at
java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown
Source)
at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown
Source)
at
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown
Source)
at
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown
Source)
at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.base/java.util.stream.ReferencePipeline.forEach(Unknown Source)
at
org.apache.hudi.metadata.HoodieTableMetadataUtil.getValidInstantTimestamps(HoodieTableMetadataUtil.java:1283)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:473)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.openReaders(HoodieBackedTableMetadata.java:429)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getOrCreateReaders(HoodieBackedTableMetadata.java:414)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupKeysFromFileSlice(HoodieBackedTableMetadata.java:291)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$f9381e22$1(HoodieBackedTableMetadata.java:275)
at
org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:38)
... 36 more
24/02/02 13:54:34 INFO HoodieStreamingSink: Retrying the failed micro batch
id=6 ...
24/02/02 13:54:34 INFO TransactionManager: Transaction manager closed
24/02/02 13:54:34 INFO AsyncCleanerService: Shutting down async clean
service...
24/02/02 13:54:34 INFO TransactionManager: Transaction manager closed
24/02/02 13:54:34 ERROR HoodieStreamingSink: Micro batch id=6 threw
following expections,aborting streaming app to avoid data loss:
org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit
time 20240202135415589
at
org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:74)
at
org.apache.hudi.table.action.commit.SparkUpsertCommitActionExecutor.execute(SparkUpsertCommitActionExecutor.java:44)
at
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:114)
at
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:103)
at
org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:142)
at
org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:224)
at
org.apache.hudi.HoodieSparkSqlWriter$.writeInternal(HoodieSparkSqlWriter.scala:431)
at
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:132)
at
org.apache.hudi.HoodieStreamingSink.$anonfun$addBatch$2(HoodieStreamingSink.scala:138)
at scala.util.Try$.apply(Try.scala:213)
at
org.apache.hudi.HoodieStreamingSink.$anonfun$addBatch$1(HoodieStreamingSink.scala:130)
at
org.apache.hudi.HoodieStreamingSink.retry(HoodieStreamingSink.scala:234)
at
org.apache.hudi.HoodieStreamingSink.addBatch(HoodieStreamingSink.scala:129)
at
org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$17(MicroBatchExecution.scala:729)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at
org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$16(MicroBatchExecution.scala:726)
at
org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:411)
at
org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:409)
at
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:67)
at
org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:726)
at
org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:284)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at
org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:411)
at
org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:409)
at
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:67)
at
org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:247)
at
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:67)
at
org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:237)
at
org.apache.spark.sql.execution.streaming.StreamExecution.$anonfun$runStream$1(StreamExecution.scala:306)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:284)
at
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:207)
Caused by: org.apache.spark.SparkException: Job aborted due to stage
failure: Task 5 in stage 256.0 failed 4 times, most recent failure: Lost task
5.3 in stage 256.0 (TID 1220) (10.224.50.139 executor 1):
org.apache.hudi.exception.HoodieException: Error occurs when executing map
at
org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:40)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown
Source)
at
java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown
Source)
at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown
Source)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown
Source)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown
Source)
at java.base/java.util.stream.AbstractTask.compute(Unknown Source)
at java.base/java.util.concurrent.CountedCompleter.exec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.doInvoke(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.invoke(Unknown Source)
at
java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.base/java.util.stream.ReferencePipeline.collect(Unknown Source)
at
org.apache.hudi.common.engine.HoodieLocalEngineContext.map(HoodieLocalEngineContext.java:84)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:270)
at
org.apache.hudi.metadata.BaseTableMetadata.readRecordIndex(BaseTableMetadata.java:296)
at
org.apache.hudi.index.SparkMetadataTableRecordIndex$RecordIndexFileGroupLookupFunction.call(SparkMetadataTableRecordIndex.java:170)
at
org.apache.hudi.index.SparkMetadataTableRecordIndex$RecordIndexFileGroupLookupFunction.call(SparkMetadataTableRecordIndex.java:157)
at
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsToPair$1(JavaRDDLike.scala:186)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:853)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:853)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassCastException: class
org.apache.avro.generic.GenericData$Record cannot be cast to class
org.apache.avro.specific.SpecificRecordBase
(org.apache.avro.generic.GenericData$Record and
org.apache.avro.specific.SpecificRecordBase are in unnamed module of loader
'app')
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:209)
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieRollbackMetadata(TimelineMetadataUtils.java:177)
at
org.apache.hudi.metadata.HoodieTableMetadataUtil.getRollbackedCommits(HoodieTableMetadataUtil.java:1355)
at
org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$getValidInstantTimestamps$37(HoodieTableMetadataUtil.java:1284)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(Unknown
Source)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(Unknown
Source)
at
java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown
Source)
at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown
Source)
at
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown
Source)
at
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown
Source)
at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.base/java.util.stream.ReferencePipeline.forEach(Unknown Source)
at
org.apache.hudi.metadata.HoodieTableMetadataUtil.getValidInstantTimestamps(HoodieTableMetadataUtil.java:1283)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:473)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.openReaders(HoodieBackedTableMetadata.java:429)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getOrCreateReaders(HoodieBackedTableMetadata.java:414)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupKeysFromFileSlice(HoodieBackedTableMetadata.java:291)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$f9381e22$1(HoodieBackedTableMetadata.java:275)
at
org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:38)
... 36 more
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720)
at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1206)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1206)
at scala.Option.foreach(Option.scala:407)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1206)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2984)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:971)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2263)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2284)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2303)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2328)
at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1019)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
at org.apache.spark.rdd.RDD.collect(RDD.scala:1018)
at
org.apache.spark.rdd.PairRDDFunctions.$anonfun$countByKey$1(PairRDDFunctions.scala:367)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
at
org.apache.spark.rdd.PairRDDFunctions.countByKey(PairRDDFunctions.scala:367)
at
org.apache.spark.api.java.JavaPairRDD.countByKey(JavaPairRDD.scala:314)
at
org.apache.hudi.data.HoodieJavaPairRDD.countByKey(HoodieJavaPairRDD.java:105)
at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.buildProfile(BaseSparkCommitActionExecutor.java:200)
at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:174)
at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:86)
at
org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:67)
... 36 more
Caused by: org.apache.hudi.exception.HoodieException: Error occurs when
executing map
at
org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:40)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown
Source)
at
java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown
Source)
at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown
Source)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown
Source)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown
Source)
at java.base/java.util.stream.AbstractTask.compute(Unknown Source)
at java.base/java.util.concurrent.CountedCompleter.exec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.doInvoke(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.invoke(Unknown Source)
at
java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.base/java.util.stream.ReferencePipeline.collect(Unknown Source)
at
org.apache.hudi.common.engine.HoodieLocalEngineContext.map(HoodieLocalEngineContext.java:84)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:270)
at
org.apache.hudi.metadata.BaseTableMetadata.readRecordIndex(BaseTableMetadata.java:296)
at
org.apache.hudi.index.SparkMetadataTableRecordIndex$RecordIndexFileGroupLookupFunction.call(SparkMetadataTableRecordIndex.java:170)
at
org.apache.hudi.index.SparkMetadataTableRecordIndex$RecordIndexFileGroupLookupFunction.call(SparkMetadataTableRecordIndex.java:157)
at
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsToPair$1(JavaRDDLike.scala:186)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:853)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:853)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassCastException: class
org.apache.avro.generic.GenericData$Record cannot be cast to class
org.apache.avro.specific.SpecificRecordBase
(org.apache.avro.generic.GenericData$Record and
org.apache.avro.specific.SpecificRecordBase are in unnamed module of loader
'app')
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:209)
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieRollbackMetadata(TimelineMetadataUtils.java:177)
at
org.apache.hudi.metadata.HoodieTableMetadataUtil.getRollbackedCommits(HoodieTableMetadataUtil.java:1355)
at
org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$getValidInstantTimestamps$37(HoodieTableMetadataUtil.java:1284)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(Unknown
Source)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(Unknown
Source)
at
java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown
Source)
at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown
Source)
at
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown
Source)
at
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown
Source)
at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.base/java.util.stream.ReferencePipeline.forEach(Unknown Source)
at
org.apache.hudi.metadata.HoodieTableMetadataUtil.getValidInstantTimestamps(HoodieTableMetadataUtil.java:1283)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:473)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.openReaders(HoodieBackedTableMetadata.java:429)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getOrCreateReaders(HoodieBackedTableMetadata.java:414)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupKeysFromFileSlice(HoodieBackedTableMetadata.java:291)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$f9381e22$1(HoodieBackedTableMetadata.java:275)
at
org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:38)
... 36 more
24/02/02 13:54:34 INFO SparkContext: Invoking stop() from shutdown hook
24/02/02 13:54:34 INFO SparkContext: SparkContext is stopping with exitCode
0.
24/02/02 13:54:35 INFO SparkUI: Stopped Spark web UI at
http://cdp-spark-hudi-poc-bd39e18d6a0fdb60-driver-svc.qbm-cdp-aggregation-spark.svc:4040
24/02/02 13:54:35 INFO KubernetesClusterSchedulerBackend: Shutting down all
executors
24/02/02 13:54:35 INFO
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each
executor to shut down
24/02/02 13:54:35 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client
has been closed.
24/02/02 13:54:35 ERROR TransportRequestHandler: Error while invoking
RpcHandler#receive() for one-way message.
org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
at
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:178)
at
org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:150)
at
org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:690)
at
org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:274)
at
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111)
at
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
at
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
at
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)
24/02/02 13:54:35 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
24/02/02 13:54:35 INFO MemoryStore: MemoryStore cleared
24/02/02 13:54:35 INFO BlockManager: BlockManager stopped
24/02/02 13:54:35 INFO BlockManagerMaster: BlockManagerMaster stopped
24/02/02 13:54:35 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
24/02/02 13:54:35 INFO SparkContext: Successfully stopped SparkContext
24/02/02 13:54:35 INFO ShutdownHookManager: Shutdown hook called
24/02/02 13:54:35 INFO ShutdownHookManager: Deleting directory
/var/data/spark-03498e5f-b96e-44c8-bbf1-1eee297285b4/spark-73e9b280-ade7-4575-a541-20f23b0844c2
24/02/02 13:54:35 INFO ShutdownHookManager: Deleting directory
/tmp/spark-146b1b36-749b-453a-b4d3-8cfeb9ef192c
```
spark Ui log
<img width="1512" alt="image"
src="https://github.com/apache/hudi/assets/115445723/33e2b639-fade-4085-97cf-61a04f6fa96c">
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]