dm-tran opened a new issue #2020: URL: https://github.com/apache/hudi/issues/2020
**Describe the problem you faced** We are using Hudi 0.5.3 patched with https://github.com/apache/hudi/pull/1765, so that a compaction that previously failed is retried before new compactions. When the compaction is retried, it fails with "java.io.FileNotFoundException". **To Reproduce** I'm sorry, but I currently don't have a simple way to reproduce this problem. Here is how I got this error: 1. Initialize a Hudi table using spark and "bulk insert" 2. Launch a spark structured streaming application that consumes messages from Kafka and saves them to Hudi, using "upsert" **Expected behavior** Compaction should not fail. **Environment Description** * Hudi version : 0.5.3 patched with https://github.com/apache/hudi/pull/1765 * Spark version : 2.4.4 (EMR 6.0.0) * Hive version : 3.1.2 * Hadoop version : 3.2.1 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** - the throughput is around 15 messages per second. - the Hudi table has around 20 partitions. - there are no external processes that delete files from s3. - the structured streaming job is run every 5 minutes with the following properties: ``` Map( "hoodie.upsert.shuffle.parallelism" -> "200", "hoodie.compact.inline" -> "true", "hoodie.compact.inline.max.delta.commits" -> "1", "hoodie.filesystem.view.incr.timeline.sync.enable":"true", HIVE_SYNC_ENABLED_OPT_KEY -> "true", HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY -> classOf[MultiPartKeysValueExtractor].getName, HIVE_STYLE_PARTITIONING_OPT_KEY -> "true", TABLE_TYPE_OPT_KEY -> MOR_TABLE_TYPE_OPT_VAL, OPERATION_OPT_KEY -> UPSERT_OPERATION_OPT_VAL, CLEANER_INCREMENTAL_MODE -> "true", CLEANER_POLICY_PROP -> HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS.name(), CLEANER_FILE_VERSIONS_RETAINED_PROP -> 12, ) ``` Output of `compactions show all` with Hudi CLI: ``` ╔═════════════════════════╤═══════════╤═══════════════════════════════╗ ║ Compaction Instant Time │ State │ Total FileIds to be Compacted ║ ╠═════════════════════════╪═══════════╪═══════════════════════════════╣ ║ 20200821154520 │ INFLIGHT │ 57 ║ ╟─────────────────────────┼───────────┼───────────────────────────────╢ ║ 20200821153748 │ COMPLETED │ 56 ║ ╟─────────────────────────┼───────────┼───────────────────────────────╢ ║ 20200821152906 │ COMPLETED │ 50 ║ ╟─────────────────────────┼───────────┼───────────────────────────────╢ ║ 20200821152207 │ COMPLETED │ 52 ║ ╟─────────────────────────┼───────────┼───────────────────────────────╢ ║ 20200821151547 │ COMPLETED │ 57 ║ ╟─────────────────────────┼───────────┼───────────────────────────────╢ ║ 20200821151014 │ COMPLETED │ 48 ║ ╟─────────────────────────┼───────────┼───────────────────────────────╢ ║ 20200821150425 │ COMPLETED │ 54 ║ ╟─────────────────────────┼───────────┼───────────────────────────────╢ ║ 20200821145904 │ COMPLETED │ 49 ║ ╟─────────────────────────┼───────────┼───────────────────────────────╢ ║ 20200821145253 │ COMPLETED │ 60 ║ ╟─────────────────────────┼───────────┼───────────────────────────────╢ ║ 20200821144717 │ COMPLETED │ 55 ║ ╟─────────────────────────┼───────────┼───────────────────────────────╢ ║ 20200821144125 │ COMPLETED │ 59 ║ ╟─────────────────────────┼───────────┼───────────────────────────────╢ ║ 20200821143533 │ COMPLETED │ 56 ║ ╟─────────────────────────┼───────────┼───────────────────────────────╢ ║ 20200821142949 │ COMPLETED │ 55 ║ ╟─────────────────────────┼───────────┼───────────────────────────────╢ ║ 20200821142335 │ COMPLETED │ 59 ║ ╟─────────────────────────┼───────────┼───────────────────────────────╢ ║ 20200821141741 │ COMPLETED │ 63 ║ ╚═════════════════════════╧═══════════╧═══════════════════════════════╝ ``` Output of `cleans show` with Hudi CLI: ``` ╔════════════════╤═════════════════════════╤═════════════════════╤══════════════════╗ ║ CleanTime │ EarliestCommandRetained │ Total Files Deleted │ Total Time Taken ║ ╠════════════════╪═════════════════════════╪═════════════════════╪══════════════════╣ ║ 20200821152814 │ │ 619 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821152115 │ │ 24 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821151459 │ │ 4 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821150921 │ │ 6 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821150334 │ │ 97 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821145815 │ │ 192 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821145201 │ │ 128 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821144630 │ │ 24 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821144033 │ │ 14 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821143441 │ │ 28 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821142858 │ │ 114 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821142242 │ │ 614 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821141650 │ │ 79 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821141111 │ │ 12 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821140501 │ │ 38 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821135933 │ │ 8 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821135412 │ │ 147 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821134904 │ │ 99 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821134339 │ │ 77 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821133821 │ │ 41 │ -1 ║ ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢ ║ 20200821133227 │ │ 1 │ -1 ║ ╚════════════════╧═════════════════════════╧═════════════════════╧══════════════════╝ ``` **Stacktrace** ``` 20/08/24 03:55:31 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! 20/08/24 03:57:48 ERROR HoodieMergeOnReadTable: Rolling back instant [==>20200821154520__compaction__INFLIGHT] 20/08/24 03:58:03 WARN HoodieCopyOnWriteTable: Rollback finished without deleting inflight instant file. Instant=[==>20200821154520__compaction__INFLIGHT] 20/08/24 03:58:33 WARN TaskSetManager: Lost task 7.0 in stage 39.0 (TID 2576, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3401-224110_20200821153748.parquet' at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207) at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190) at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139) at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98) at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040) at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221) at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3401-224110_20200821153748.parquet' at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617) at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553) at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300) at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202) ... 26 more 20/08/24 03:58:49 WARN TaskSetManager: Lost task 7.3 in stage 39.0 (TID 2582, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3401-224110_20200821153748.parquet' at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207) at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190) at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139) at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98) at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040) at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221) at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3401-224110_20200821153748.parquet' at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617) at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553) at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300) at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202) ... 26 more 20/08/24 03:58:49 ERROR TaskSetManager: Task 7 in stage 39.0 failed 4 times; aborting job 20/08/24 03:58:49 ERROR MicroBatchExecution: Query [id = 418bbb3a-3def-4a20-987b-2ac7a0ca7004, runId = ff16cb78-6247-413f-bd94-afd1c3ef48ed] terminated with error org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 39.0 failed 4 times, most recent failure: Lost task 7.3 in stage 39.0 (TID 2582, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3401-224110_20200821153748.parquet' at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207) at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190) at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139) at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98) at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040) at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221) at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3401-224110_20200821153748.parquet' at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617) at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553) at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300) at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202) ... 26 more Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2041) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2029) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2028) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2028) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:966) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:966) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:966) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2262) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2211) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2200) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:777) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:945) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.collect(RDD.scala:944) at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:361) at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:360) at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45) at org.apache.hudi.client.HoodieWriteClient.doCompactionCommit(HoodieWriteClient.java:1134) at org.apache.hudi.client.HoodieWriteClient.commitCompaction(HoodieWriteClient.java:1102) at org.apache.hudi.client.HoodieWriteClient.runCompaction(HoodieWriteClient.java:1085) at org.apache.hudi.client.HoodieWriteClient.compact(HoodieWriteClient.java:1056) at org.apache.hudi.client.HoodieWriteClient.lambda$runEarlierInflightCompactions$3(HoodieWriteClient.java:524) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) at org.apache.hudi.client.HoodieWriteClient.runEarlierInflightCompactions(HoodieWriteClient.java:521) at org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:501) at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:157) at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:101) at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:92) at org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:268) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:188) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:156) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83) at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at jp.ne.paypay.daas.dataprocessor.writer.EventsWriter$.saveToHudiTable(EventsWriter.scala:145) at jp.ne.paypay.daas.dataprocessor.MainProcessor$.processBatch(MainProcessor.scala:162) at jp.ne.paypay.daas.dataprocessor.MainProcessor$.$anonfun$main$4(MainProcessor.scala:90) at jp.ne.paypay.daas.dataprocessor.MainProcessor$.$anonfun$main$4$adapted(MainProcessor.scala:82) at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$15(MicroBatchExecution.scala:537) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$14(MicroBatchExecution.scala:536) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:351) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:349) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:535) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:198) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:351) at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:349) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193) Caused by: org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3401-224110_20200821153748.parquet' at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207) at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190) at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139) at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98) at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040) at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221) at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3401-224110_20200821153748.parquet' at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617) at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553) at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300) at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202) ... 26 more ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
