[
https://issues.apache.org/jira/browse/HUDI-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-3595:
---------------------------------
Labels: pull-request-available (was: )
> Empty batch of data resulting in Deltastreamer failure
> ------------------------------------------------------
>
> Key: HUDI-3595
> URL: https://issues.apache.org/jira/browse/HUDI-3595
> Project: Apache Hudi
> Issue Type: Bug
> Components: deltastreamer
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Critical
> Labels: pull-request-available
> Fix For: 0.11.0
>
>
> when an empty batch gets consumed by deltastreamer (checkpoint moved), looks
> like the schema we serialize into commit metadata is "null" string. And this
> causes issues down the line when table schema is fetched by callers.
>
> {code:java}
> Caused by: org.apache.hudi.exception.HoodieUpsertException: Error upserting
> bucketType UPDATE for partition :0
> at
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:333)
> at
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleInsertPartition(BaseSparkCommitActionExecutor.java:339)
> at
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:178)
> at
> org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
> at
> org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
> at
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
> at
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
> at
> org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498)
> at
> org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408)
> at
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472)
> at
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295)
> at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> at org.apache.spark.scheduler.Task.run(Task.scala:131)
> at
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
> ... 3 more
> Caused by: org.apache.avro.AvroRuntimeException: Not a record: "null"
> at org.apache.avro.Schema.getFields(Schema.java:279)
> at
> org.apache.hudi.avro.HoodieAvroUtils.addMetadataFields(HoodieAvroUtils.java:209)
> at
> org.apache.hudi.io.HoodieWriteHandle.<init>(HoodieWriteHandle.java:115)
> at
> org.apache.hudi.io.HoodieWriteHandle.<init>(HoodieWriteHandle.java:104)
> at
> org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:122)
> at
> org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:115)
> at
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.getUpdateHandle(BaseSparkCommitActionExecutor.java:381)
> at
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:352)
> at
> org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:80)
> at
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:326)
> ... 29 more
> 22/03/08 04:33:20 INFO ShutdownHookManager: Shutdown hook called {code}
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)