Zhujun-Vungle opened a new issue #697: Hi, about spark retry problem, still exists in 0.4.6 with consistency on URL: https://github.com/apache/incubator-hudi/issues/697 Running 0.4.6 version Still had duplicate files due to spark retry, and caused by s3 file not found. I didn't dig into the code details, but did we still generate new file id when retry? `2019-05-28 02:38:12 INFO MapOutputTrackerMasterEndpoint:54 - Asked to send map output locations for shuffle 1308 to 172.19.101.41:47318 2019-05-28 02:38:18 WARN TaskSetManager:66 - Lost task 1.0 in stage 4306.0 (TID 1021141, ip-172-19-101-139, executor 1): java.lang.RuntimeException: com.uber.hoodie.exception.HoodieException: com.uber.hoodie.exception.HoodieException: java.util.concurrent.ExecutionException: com.uber.hoodie.exception.HoodieInsertException: Failed to close the Insert Handle for path s3a://vungle2-dataeng/jun-test/hudi/20190527/2019-05-28_02/a3fe871b-465a-4e55-8626-f518a5888b56_1_20190528023745.parquet at com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:378) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: com.uber.hoodie.exception.HoodieException: com.uber.hoodie.exception.HoodieException: java.util.concurrent.ExecutionException: com.uber.hoodie.exception.HoodieInsertException: Failed to close the Insert Handle for path s3a://vungle2-dataeng/jun-test/hudi/20190527/2019-05-28_02/a3fe871b-465a-4e55-8626-f518a5888b56_1_20190528023745.parquet at com.uber.hoodie.func.CopyOnWriteLazyInsertIterable.computeNext(CopyOnWriteLazyInsertIterable.java:106) at com.uber.hoodie.func.CopyOnWriteLazyInsertIterable.computeNext(CopyOnWriteLazyInsertIterable.java:45) at com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119) ... 20 more Caused by: com.uber.hoodie.exception.HoodieException: java.util.concurrent.ExecutionException: com.uber.hoodie.exception.HoodieInsertException: Failed to close the Insert Handle for path s3a://vungle2-dataeng/jun-test/hudi/20190527/2019-05-28_02/a3fe871b-465a-4e55-8626-f518a5888b56_1_20190528023745.parquet at com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146) at com.uber.hoodie.func.CopyOnWriteLazyInsertIterable.computeNext(CopyOnWriteLazyInsertIterable.java:102) ... 22 more Caused by: java.util.concurrent.ExecutionException: com.uber.hoodie.exception.HoodieInsertException: Failed to close the Insert Handle for path s3a://vungle2-dataeng/jun-test/hudi/20190527/2019-05-28_02/a3fe871b-465a-4e55-8626-f518a5888b56_1_20190528023745.parquet at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:144) ... 23 more Caused by: com.uber.hoodie.exception.HoodieInsertException: Failed to close the Insert Handle for path s3a://vungle2-dataeng/jun-test/hudi/20190527/2019-05-28_02/a3fe871b-465a-4e55-8626-f518a5888b56_1_20190528023745.parquet at com.uber.hoodie.io.HoodieCreateHandle.close(HoodieCreateHandle.java:177) at com.uber.hoodie.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.finish(CopyOnWriteLazyInsertIterable.java:168) at com.uber.hoodie.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:42) at com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:124) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more Caused by: java.io.FileNotFoundException: No such file or directory: s3a://vungle2-dataeng/jun-test/hudi/20190527/2019-05-28_02/a3fe871b-465a-4e55-8626-f518a5888b56_1_20190528023745.parquet at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:993) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77) at com.uber.hoodie.common.util.FSUtils.getFileSize(FSUtils.java:126) at com.uber.hoodie.io.HoodieCreateHandle.close(HoodieCreateHandle.java:168) ... 7 more ` Had set .option("hoodie.consistency.check.enabled", "true"). 
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
