[ 
https://issues.apache.org/jira/browse/HUDI-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3595:
--------------------------------------
    Fix Version/s: 0.11.0

> Empty batch of data resulting in Deltastreamer failure
> ------------------------------------------------------
>
>                 Key: HUDI-3595
>                 URL: https://issues.apache.org/jira/browse/HUDI-3595
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: deltastreamer
>            Reporter: sivabalan narayanan
>            Priority: Critical
>             Fix For: 0.11.0
>
>
> when an empty batch gets consumed by deltastreamer (checkpoint moved), looks 
> like the schema we serialize into commit metadata is "null" string. And this 
> causes issues down the line when table schema is fetched by callers. 
>  
> {code:java}
> Caused by: org.apache.hudi.exception.HoodieUpsertException: Error upserting 
> bucketType UPDATE for partition :0
>       at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:333)
>       at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleInsertPartition(BaseSparkCommitActionExecutor.java:339)
>       at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:178)
>       at 
> org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
>       at 
> org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
>       at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
>       at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>       at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386)
>       at 
> org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498)
>       at 
> org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408)
>       at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472)
>       at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295)
>       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>       at org.apache.spark.scheduler.Task.run(Task.scala:131)
>       at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>       ... 3 more
> Caused by: org.apache.avro.AvroRuntimeException: Not a record: "null"
>       at org.apache.avro.Schema.getFields(Schema.java:279)
>       at 
> org.apache.hudi.avro.HoodieAvroUtils.addMetadataFields(HoodieAvroUtils.java:209)
>       at 
> org.apache.hudi.io.HoodieWriteHandle.<init>(HoodieWriteHandle.java:115)
>       at 
> org.apache.hudi.io.HoodieWriteHandle.<init>(HoodieWriteHandle.java:104)
>       at 
> org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:122)
>       at 
> org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:115)
>       at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.getUpdateHandle(BaseSparkCommitActionExecutor.java:381)
>       at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:352)
>       at 
> org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:80)
>       at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:326)
>       ... 29 more
> 22/03/08 04:33:20 INFO ShutdownHookManager: Shutdown hook called {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to