nsivabalan commented on pull request #2359:
URL: https://github.com/apache/hudi/pull/2359#issuecomment-781506162


   IIUC, we are delegating the clean up of failed writes to cleaner. Did you 
folks discuss the possibility of introducing another async table service called 
FailedJobsCleaner or something. Bcoz, we can't force customer to make regular 
cleaning aggressive as they might have their own reasons to retain for longer 
period of time. But what if they wish to clean up failed writes sooner but not 
inline ;) Also, from my understanding, this new "FailedJobsCleaner" does not 
need to coordinate w/ anything or take locks. just need to delete unreachable 
files. So, overhead may not be much as compared to other jobs. no updates to 
metadata as well right? or do we need to? 
   
   I understand we keep adding more weight to async operations (cleaning, 
compaction, clustering, etc). anyways, wanted to hear your thoughts. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to