Shagish opened a new issue, #7577: URL: https://github.com/apache/hudi/issues/7577
Hi Team We are facing an issue in our Prod environment for Hoodie table The application was running fine, and it was writing to the hoodie table and all sudden the application failed when we are trying to bring back the application. it run for 5-10 mints and while writing to the file it throws an error. Below are the details What table type cow or mor - MOR What spark version - 3.2.1 What hudi version - 0.11.0 Where r u running spark jobs - In EMR 6.7.0 What is Hadoop version What were you trying to The application is a Spark Hoodie streaming Job. It reads the message from the Kafka topic, process the message and then write to hoodie table. The application Runs for a while and later while writing the data to the Hoodie table, it fails with file not found exception The file which it complains as not found is very old (12/01/2022) parquet file. What have you tried We tried with changing the Hoodie properties and restarting the steps, but it is failed Below is the log details 022-12-22 22:52:32 INFO YarnClusterScheduler:57 - Killing all running tasks in stage 497: Stage cancelled 2022-12-22 22:52:33 INFO DAGScheduler:57 - ResultStage 497 (start at Application.java:101) failed in 2.542 s due to Job aborted due to stage failure: Task 0 in stage 497.0 failed 4 times, most recent failure: Lost task 0.3 in stage 497.0 (TID 762) ([ip-10-220-71-253.emr.awsw.cld.ds.dtvops.net](http://ip-10-220-71-253.emr.awsw.cld.ds.dtvops.net/) executor 2): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleInsertPartition(BaseSparkCommitActionExecutor.java:335) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:246) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498) at [org.apache.spark.storage.BlockManager.org](http://org.apache.spark.storage.blockmanager.org/)$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384) at org.apache.spark.rdd.RDD.iterator(RDD.scala:335) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:133) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://XXXXXXXXX/up_md_info/table/df245ac4-eafb-491b-8f5f-fcbb920b30ee-0_20-1773-8703_20221201102406279.parquet' at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:149) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:358) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:349) at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:80) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322) ... 29 more Caused by: java.io.FileNotFoundException: No such file or directory 's3://XXXXXXXXX/up_md_info/table/df245ac4-eafb-491b-8f5f-fcbb920b30ee-0_20-1773-8703_20221201102406279.parquet' at [com.amazon.ws](http://com.amazon.ws/).emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:521) at [com.amazon.ws](http://com.amazon.ws/).emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:613) at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:337) at [org.apache.hudi.io](http://org.apache.hudi.io/).storage.HoodieParquetReader.getRecordIterator(HoodieParquetReader.java:70) at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:134) ... 33 more [11:13](https://apache-hudi.slack.com/archives/C4D716NPQ/p1672168421972849) Driver stacktrace: 2022-12-22 22:52:33 INFO DAGScheduler:57 - Job 333 failed: start at Application.java:101, took 2.693215 s 2022-12-22 22:52:33 ERROR MicroBatchExecution:94 - Query [id = d6b8649b-2b4a-43e3-9140-c997af8e1214, runId = 6c44b42b-3bc3-4e1f-a6a5-0cd542e11cd2] terminated with error org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 497.0 failed 4 times, most recent failure: Lost task 0.3 in stage 497.0 (TID 762) ([ip-10-220-71-253.emr.awsw.cld.ds.dtvops.net](http://ip-10-220-71-253.emr.awsw.cld.ds.dtvops.net/) executor 2): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleInsertPartition(BaseSparkCommitActionExecutor.java:335) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:246) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498) at [org.apache.spark.storage.BlockManager.org](http://org.apache.spark.storage.blockmanager.org/)$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384) at org.apache.spark.rdd.RDD.iterator(RDD.scala:335) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:133) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://XXXXXXXXX/up_md_info/table/df245ac4-eafb-491b-8f5f-fcbb920b30ee-0_20-1773-8703_20221201102406279.parquet' at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:149) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:358) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:349) at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:80) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322) ... 29 more Caused by: java.io.FileNotFoundException: No such file or directory 's3://XXXXXXXXX/up_md_info/table/df245ac4-eafb-491b-8f5f-fcbb920b30ee-0_20-1773-8703_20221201102406279.parquet' at [com.amazon.ws](http://com.amazon.ws/).emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:521) at [com.amazon.ws](http://com.amazon.ws/).emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:613) at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:337) at [org.apache.hudi.io](http://org.apache.hudi.io/).storage.HoodieParquetReader.getRecordIterator(HoodieParquetReader.java:70) at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:134) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
