[GitHub] [hudi] eric9204 commented on issue #6308: [SUPPORT] Spark multi writer failed ! ! !

GitBox Tue, 30 Aug 2022 19:35:29 -0700


eric9204 commented on issue #6308:
URL: https://github.com/apache/hudi/issues/6308#issuecomment-1232381134


   @nsivabalan I know what you mean, there is another problem.
   
   If one of writers fail, this hoodie commit transaction was failed at hudi 
side, but this batch was successful at spark side. So the next micro batch will 
consume record from last successful commit of offset. the data of last 
successful micro batch which is failed at hoodie side actually may be lost 。
   
   ```
   22/08/05 15:01:18 INFO HoodieStreamingSink: Ignore the exception and move on 
streaming as per hoodie.datasource.write.streaming.ignore.failed.batch 
configuration
   22/08/05 15:01:18 INFO HoodieStreamingSink: Micro batch id=32 succeeded
   22/08/05 15:01:18 INFO BlockManager: Removing RDD 159
   22/08/05 15:01:18 INFO CheckpointFileManager: Writing atomically to 
hdfs://host-10-4-6-18:8020/tmp/hudi/ckp1/commits/32 using temp file 
hdfs://host-10-4-6-18:8020/tmp/hudi/ckp1/commits/.32.a829b364-c1b9-4b4b-8fae-f7d866ed76e5.tmp
   22/08/05 15:01:18 INFO CheckpointFileManager: Renamed temp file 
hdfs://host-10-4-6-18:8020/tmp/hudi/ckp1/commits/.32.a829b364-c1b9-4b4b-8fae-f7d866ed76e5.tmp
 to hdfs://host-10-4-6-18:8020/tmp/hudi/ckp1/commits/32
   ```
   
   Could the community add a retry strategy to make it successful instead of 
just discarding it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] eric9204 commented on issue #6308: [SUPPORT] Spark multi writer failed ! ! !

Reply via email to