howardcho opened a new issue, #11454:
URL: https://github.com/apache/hudi/issues/11454

   Last night I was running multiple upsert Glue jobs against a single table 
(backfilling missing data), when I started getting this error:
   `An error occurred while calling o333.save. Failed to apply clean commit to 
metadata`
   
   Now, none of my jobs will complete successfully. I retried again this 
morning with my hourly incremental and it failed with the same error. I presume 
I accidentally had two jobs writing to the same partition, which caused some 
sort of deadlock. Could someone please assist me in getting my table back into 
a usable state?
   
   Hudi version: 0.14.0
   Config:
   ```
   {'hoodie.table.name': 'xxx', 'hoodie.datasource.write.table.type': 
'COPY_ON_WRITE', 'hoodie.datasource.write.operation': 'upsert', 
'hoodie.datasource.write.recordkey.field': 
'received_year,received_month,received_day,request_uuid', 
'hoodie.datasource.write.precombine.field': 'nats_timestamp', 
'hoodie.datasource.write.hive_style_partitioning': True, 
'hoodie.metadata.record.index.enable': False, 'hoodie.index.type': 'BLOOM', 
'hoodie.parquet.max.file.size': 536870912, 'hoodie.parquet.small.file.limit': 
104857600, 'hoodie.metadata.enable': 'true', 'hoodie.metadata.index.async': 
'false', 'hoodie.metadata.index.column.stats.enable': 'true', 
'hoodie.metadata.index.check.timeout.seconds': '60', 
'hoodie.write.concurrency.mode': 'optimistic_concurrency_control', 
'hoodie.write.lock.provider': 
'org.apache.hudi.client.transaction.lock.InProcessLockProvider', 
'hoodie.datasource.write.schema.allow.auto.evolution.column.drop': True, 
'hoodie.datasource.write.partitionpath.field': 'received_year,re
 ceived_month,received_day', 'hoodie.datasource.hive_sync.partition_fields': 
'received_year,received_month,received_day', 'hoodie.clean.automatic': 'true', 
'hoodie.clean.async': 'false', 'hoodie.cleaner.policy': 
'KEEP_LATEST_FILE_VERSIONS', 'hoodie.cleaner.fileversions.retained': '3', 
'hoodie-conf hoodie.cleaner.parallelism': '200', 
'hoodie.cleaner.commits.retained': 5, 'hoodie.parquet.compression.codec': 
'gzip', 'hoodie.datasource.hive_sync.enable': 'true', 
'hoodie.datasource.hive_sync.database': 'product_usage', 
'hoodie.datasource.hive_sync.table': 'usage', 
'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.MultiPartKeysValueExtractor', 
'hoodie.datasource.hive_sync.use_jdbc': 'false', 
'hoodie.datasource.hive_sync.mode': 'hms', 
'hoodie.datasource.hive_sync.support_timestamp': True, 
'hive_sync.support_timestamp': True, 
'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.ComplexKeyGenerator'}
   ```
    * Stack trace below
   
   I re-ran a single incremental job with `hoodie.metadata.enabled=False` and 
it worked, but when trying to re-run the older backfill jobs, they continue to 
fail.
   
   I then tried modifying the `hoodie-conf hoodie.cleaner.parallelism` and  
`hoodie.cleaner.commits.retained`:
   ```
   Writing Hudi 0.14.0 data with method: upsert
   {'hoodie.table.name': 'xxx', 'hoodie.datasource.write.table.type': 
'COPY_ON_WRITE', 'hoodie.datasource.write.operation': 'upsert', 
'hoodie.datasource.write.recordkey.field': 
'received_year,received_month,received_day,request_uuid', 
'hoodie.datasource.write.precombine.field': 'nats_timestamp', 
'hoodie.datasource.write.hive_style_partitioning': True, 
'hoodie.metadata.record.index.enable': False, 'hoodie.index.type': 'BLOOM', 
'hoodie.parquet.max.file.size': 536870912, 'hoodie.parquet.small.file.limit': 
104857600, 'hoodie.metadata.enable': 'true', 'hoodie.metadata.index.async': 
'false', 'hoodie.metadata.index.column.stats.enable': 'true', 
'hoodie.metadata.index.check.timeout.seconds': '60', 
'hoodie.write.concurrency.mode': 'optimistic_concurrency_control', 
'hoodie.write.lock.provider': 
'org.apache.hudi.client.transaction.lock.InProcessLockProvider', 
'hoodie.datasource.write.schema.allow.auto.evolution.column.drop': True, 
'hoodie.datasource.write.partitionpath.field': 'received_year,re
 ceived_month,received_day', 'hoodie.datasource.hive_sync.partition_fields': 
'received_year,received_month,received_day', 'hoodie.clean.automatic': 'true', 
'hoodie.clean.async': 'false', 'hoodie.cleaner.policy': 
'KEEP_LATEST_FILE_VERSIONS', 'hoodie.cleaner.fileversions.retained': '3', 
'hoodie-conf hoodie.cleaner.parallelism': 10, 
'hoodie.cleaner.commits.retained': 20, 'hoodie.parquet.compression.codec': 
'gzip', 'hoodie.datasource.hive_sync.enable': 'true', 
'hoodie.datasource.hive_sync.database': 'product_usage', 
'hoodie.datasource.hive_sync.table': 'usage', 
'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.MultiPartKeysValueExtractor', 
'hoodie.datasource.hive_sync.use_jdbc': 'false', 
'hoodie.datasource.hive_sync.mode': 'hms', 
'hoodie.datasource.hive_sync.support_timestamp': True, 
'hive_sync.support_timestamp': True, 
'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.ComplexKeyGenerator'}
   ```
   
   Unfortunately, it still failed with `An error occurred while calling 
o333.save. Failed to apply clean commit to metadata`
   
   @xushiyan commented:
   ```
   i copied the 2 lines from the log your shared above. it shows the root cause 
which points to a metadata table issue with column stats.
   ```
   
   I'm still unclear on how I can alleviate the issue. Any help would be 
greatly appreciated.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Unsure, but perhaps running multiple Glue loading jobs in parallel
   
   **Expected behavior**
   
   I expected my ingestion jobs to complete successfully.
   
   **Environment Description**
   
   * Hudi version : 0.14.0
   
   * Spark version : 3.13 bundle 2.12
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   
   
   **Additional context**
   
   This is impacting a new production pipeline which cannot go live until 
resolved.
   
   **Stacktrace**
   
   ```
   py4j.protocol.Py4JJavaError: An error occurred while calling o333.save.
   : org.apache.hudi.exception.HoodieException: Failed to apply clean commit to 
metadata
       at 
org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:91)
       at 
org.apache.hudi.table.action.clean.CleanActionExecutor.runClean(CleanActionExecutor.java:227)
       at 
org.apache.hudi.table.action.clean.CleanActionExecutor.runPendingClean(CleanActionExecutor.java:193)
       at 
org.apache.hudi.table.action.clean.CleanActionExecutor.execute(CleanActionExecutor.java:263)
       at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.clean(HoodieSparkCopyOnWriteTable.java:291)
       at 
org.apache.hudi.client.BaseHoodieTableServiceClient.clean(BaseHoodieTableServiceClient.java:762)
       at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:861)
       at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:834)
       at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:865)
       at 
org.apache.hudi.client.BaseHoodieWriteClient.autoCleanOnCommit(BaseHoodieWriteClient.java:599)
       at 
org.apache.hudi.client.BaseHoodieWriteClient.mayBeCleanAndArchive(BaseHoodieWriteClient.java:578)
       at 
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:248)
       at 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:104)
       at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1059)
       at 
org.apache.hudi.HoodieSparkSqlWriter$.writeInternal(HoodieSparkSqlWriter.scala:441)
       at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:132)
       at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150)
       at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
       at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
       at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
       at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
       at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:103)
       at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
       at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
       at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:114)
       at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139)
       at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
       at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
       at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139)
       at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245)
       at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138)
       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
       at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
       at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:100)
       at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:96)
       at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:615)
       at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:177)
       at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:615)
       at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
       at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
       at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
       at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
       at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
       at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:591)
       at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:96)
       at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:83)
       at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:81)
       at 
org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:124)
       at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:860)
       at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:390)
       at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:363)
       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
       at py4j.Gateway.invoke(Gateway.java:282)
       at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
       at py4j.commands.CallCommand.execute(CallCommand.java:79)
       at 
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
       at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
       at java.lang.Thread.run(Thread.java:750)
   Caused by: org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0 in stage 64.0 failed 4 times, most recent failure: Lost task 
0.3 in stage 64.0 (TID 707) (172.39.110.44 executor 1): 
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType 
UPDATE for partition :0
       at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:342)
       at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:257)
       at 
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
       at 
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
       at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:907)
       at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:907)
       at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
       at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
       at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:378)
       at 
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1525)
       at 
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1435)
       at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1499)
       at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1322)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:376)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:327)
       at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:138)
       at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1516)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:750)
   Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while 
appending records to 
s3://xxx/.hoodie/metadata/column_stats/.col-stats-0000-0_20240612200231775001.log.31_0-64-707
       at 
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:487)
       at 
org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:450)
       at 
org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:83)
       at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335)
       ... 28 more
   Caused by: org.apache.hudi.exception.HoodieException: Writing multiple 
records with same key +Dnh4tS9NTQ=H+lGtHeAeTg=pd5P6Jjt2OKx1fRDxMSeJQ== not 
supported for org.apache.hudi.common.table.log.block.HoodieHFileDataBlock
       at 
org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:143)
       at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:115)
       at 
org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
       at 
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:479)
       ... 31 more
   Driver stacktrace:
       at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2863)
       at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2799)
       at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2798)
       at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
       at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
       at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
       at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2798)
       at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1239)
       at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1239)
       at scala.Option.foreach(Option.scala:407)
       at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1239)
       at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3051)
       at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2993)
       at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2982)
       at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
       at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1009)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2229)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2250)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2269)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2294)
       at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1021)
       at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
       at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
       at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
       at org.apache.spark.rdd.RDD.collect(RDD.scala:1020)
       at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:362)
       at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361)
       at 
org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
       at 
org.apache.hudi.data.HoodieJavaRDD.collectAsList(HoodieJavaRDD.java:177)
       at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.setCommitMetadata(BaseSparkCommitActionExecutor.java:289)
       at 
org.apache.hudi.table.action.commit.BaseCommitActionExecutor.autoCommit(BaseCommitActionExecutor.java:197)
       at 
org.apache.hudi.table.action.commit.BaseCommitActionExecutor.commitOnAutoCommit(BaseCommitActionExecutor.java:183)
       at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.updateIndexAndCommitIfNeeded(BaseSparkCommitActionExecutor.java:279)
       at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:184)
       at 
org.apache.hudi.table.action.deltacommit.SparkUpsertPreppedDeltaCommitActionExecutor.execute(SparkUpsertPreppedDeltaCommitActionExecutor.java:44)
       at 
org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:126)
       at 
org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:88)
       at 
org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:156)
       at 
org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:63)
       at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.commitInternal(HoodieBackedTableMetadataWriter.java:1132)
       at 
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:117)
       at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:855)
       at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:933)
       at 
org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:86)
       ... 63 more
   Caused by: org.apache.hudi.exception.HoodieUpsertException: Error upserting 
bucketType UPDATE for partition :0
       at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:342)
       at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:257)
       at 
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
       at 
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
       at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:907)
       at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:907)
       at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
       at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
       at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:378)
       at 
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1525)
       at 
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1435)
       at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1499)
       at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1322)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:376)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:327)
       at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:138)
       at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1516)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       ... 1 more
   Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while 
appending records to 
s3://xxx/.hoodie/metadata/column_stats/.col-stats-0000-0_20240612200231775001.log.31_0-64-707
       at 
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:487)
       at 
org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:450)
       at 
org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:83)
       at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335)
       ... 28 more
   Caused by: org.apache.hudi.exception.HoodieException: Writing multiple 
records with same key +Dnh4tS9NTQ=H+lGtHeAeTg=pd5P6Jjt2OKx1fRDxMSeJQ== not 
supported for org.apache.hudi.common.table.log.block.HoodieHFileDataBlock
       at 
org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:143)
       at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:115)
       at 
org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158)
       at 
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:479)
       ... 31 more
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to