Zhujun-Vungle opened a new issue #697: Hi, about spark retry problem, still 
exists in 0.4.6 with consistency on
URL: https://github.com/apache/incubator-hudi/issues/697
 
 
   Running 0.4.6 version
   Still had duplicate files due to spark retry, and caused by s3 file not 
found.
   I didn't dig into the code details, but did we still generate new file id 
when retry?
   
   `2019-05-28 02:38:12 INFO  MapOutputTrackerMasterEndpoint:54 - Asked to send 
map output locations for shuffle 1308 to 172.19.101.41:47318
   2019-05-28 02:38:18 WARN  TaskSetManager:66 - Lost task 1.0 in stage 4306.0 
(TID 1021141, ip-172-19-101-139, executor 1): java.lang.RuntimeException: 
com.uber.hoodie.exception.HoodieException: 
com.uber.hoodie.exception.HoodieException: 
java.util.concurrent.ExecutionException: 
com.uber.hoodie.exception.HoodieInsertException: Failed to close the Insert 
Handle for path 
s3a://vungle2-dataeng/jun-test/hudi/20190527/2019-05-28_02/a3fe871b-465a-4e55-8626-f518a5888b56_1_20190528023745.parquet
        at 
com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:121)
        at 
scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at 
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:378)
        at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109)
        at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
        at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
        at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809)
        at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: com.uber.hoodie.exception.HoodieException: 
com.uber.hoodie.exception.HoodieException: 
java.util.concurrent.ExecutionException: 
com.uber.hoodie.exception.HoodieInsertException: Failed to close the Insert 
Handle for path 
s3a://vungle2-dataeng/jun-test/hudi/20190527/2019-05-28_02/a3fe871b-465a-4e55-8626-f518a5888b56_1_20190528023745.parquet
        at 
com.uber.hoodie.func.CopyOnWriteLazyInsertIterable.computeNext(CopyOnWriteLazyInsertIterable.java:106)
        at 
com.uber.hoodie.func.CopyOnWriteLazyInsertIterable.computeNext(CopyOnWriteLazyInsertIterable.java:45)
        at 
com.uber.hoodie.func.LazyIterableIterator.next(LazyIterableIterator.java:119)
        ... 20 more
   Caused by: com.uber.hoodie.exception.HoodieException: 
java.util.concurrent.ExecutionException: 
com.uber.hoodie.exception.HoodieInsertException: Failed to close the Insert 
Handle for path 
s3a://vungle2-dataeng/jun-test/hudi/20190527/2019-05-28_02/a3fe871b-465a-4e55-8626-f518a5888b56_1_20190528023745.parquet
        at 
com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146)
        at 
com.uber.hoodie.func.CopyOnWriteLazyInsertIterable.computeNext(CopyOnWriteLazyInsertIterable.java:102)
        ... 22 more
   Caused by: java.util.concurrent.ExecutionException: 
com.uber.hoodie.exception.HoodieInsertException: Failed to close the Insert 
Handle for path 
s3a://vungle2-dataeng/jun-test/hudi/20190527/2019-05-28_02/a3fe871b-465a-4e55-8626-f518a5888b56_1_20190528023745.parquet
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:144)
        ... 23 more
   Caused by: com.uber.hoodie.exception.HoodieInsertException: Failed to close 
the Insert Handle for path 
s3a://vungle2-dataeng/jun-test/hudi/20190527/2019-05-28_02/a3fe871b-465a-4e55-8626-f518a5888b56_1_20190528023745.parquet
        at 
com.uber.hoodie.io.HoodieCreateHandle.close(HoodieCreateHandle.java:177)
        at 
com.uber.hoodie.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.finish(CopyOnWriteLazyInsertIterable.java:168)
        at 
com.uber.hoodie.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:42)
        at 
com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:124)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        ... 3 more
   Caused by: java.io.FileNotFoundException: No such file or directory: 
s3a://vungle2-dataeng/jun-test/hudi/20190527/2019-05-28_02/a3fe871b-465a-4e55-8626-f518a5888b56_1_20190528023745.parquet
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:993)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
        at com.uber.hoodie.common.util.FSUtils.getFileSize(FSUtils.java:126)
        at 
com.uber.hoodie.io.HoodieCreateHandle.close(HoodieCreateHandle.java:168)
        ... 7 more
   `
   
   Had set .option("hoodie.consistency.check.enabled", "true").
   
![image](https://user-images.githubusercontent.com/22140275/58459290-9730a500-815d-11e9-8799-1e484a73543c.png)
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to