Limess opened a new issue, #6367: URL: https://github.com/apache/hudi/issues/6367
**Describe the problem you faced** We're seeing some weird behaviour since upgrading to Hudi 0.11.0-amzn (i.e. EMR 6.7.0) from Hudi 0.10.0-amzn (EMR 6.6.0): Deltastreamer jobs are seemingly succeeding, but have a failed job now.   Is this important? What does this failure mean here - it seems to just be a column stats issue which I guess is not a hard fail, can take action to fix this? For reference, we enabled hoodie.metadata.enable=true, hoodie.metadata.index.column.stats.enable=true and hoodie.metadata.index.bloom.filter.enable=true in the upgrade. (edited) 2 files ``` Job aborted due to stage failure: Task 5 in stage 69.0 failed 10 times, most recent failure: Lost task 5.9 in stage 69.0 (TID 9119) (ip-10-0-35-115.eu-west-1.compute.internal executor 3): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :5 +details Job aborted due to stage failure: Task 5 in stage 69.0 failed 10 times, most recent failure: Lost task 5.9 in stage 69.0 (TID 9119) (ip-10-0-35-115.eu-west-1.compute.internal executor 3): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :5 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:244) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384) at org.apache.spark.rdd.RDD.iterator(RDD.scala:335) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:133) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while appending records to s3://<REDACTED>/.hoodie/metadata/column_stats/.col-stats-0000_20220809191455283.log.3_5-69-9119 at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:410) at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:382) at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:84) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322) ... 28 more Caused by: java.lang.IllegalStateException: Writing multiple records with same key not supported for org.apache.hudi.common.table.log.block.HoodieHFileDataBlock at org.apache.hudi.common.util.ValidationUtils.checkState(ValidationUtils.java:67) at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:136) at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:131) at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:158) at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:404) ... 31 more Driver stacktrace: ``` **To Reproduce** Steps to reproduce the behavior: Unknown at this time **Expected behavior** No error occurs in the Spark job. **Environment Description** * Hudi version : 0.11.0-amzn * Spark version : 3.2.1-amzn * Hive version : 3.1.3 * Hadoop version : 3.2.1 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** Add any other context about the problem here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
