vamshipasunuru opened a new issue, #14200:
URL: https://github.com/apache/hudi/issues/14200

   ### Bug Description
   
   
   **What happened:**
   
   In Hudi-Flink, we observed that when timeline based marker servers were 
used, the rollback didn't delete all the files that were part of the failed 
commit. This causes, Hudi to include those files after archival of commits 
(.inflight && .requested).
   
   **What you expected:**
   100% of files created by ingestion commit should be deleted.
   
   **Steps to reproduce:**
   1. Simulate commit failure with flink restart. The ingestion should have 
generated marker files and partially wrote data files. Timeline will only 
contain .requested and .inflight
   2. Next write of ingestion, will do a clean-up of failed commits.  The 
clean-up finishes without errors but not all data files in the marker directory 
are deleted. This was evident from the logs in 
`MarkerBasedRollbackStrategy.getRollbackRequests` count of files read!=count 
files written during the commit time.  We suspect a bug in timeline server 
contributing to this. 
   
   ### Environment
   
   **Hudi version:**
   0.14
   **Query engine:** (Spark/Flink/Trino etc)
   Flink
   **Relevant configs:**
   hoodie.cleaner.prewrite.cleaner.policy=rollback_failed_writes
   hoodie.write.markers.type=TIMELINE_SERVER_BASED
   
   ### Logs and Stack Trace
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to