boneanxs commented on PR #9922:
URL: https://github.com/apache/hudi/pull/9922#issuecomment-1786515927

   > > Thanks for the fix, from high-level, I kind of think we should avoid to 
relies on the Spark mechanisms to add any rollback/cleaning improvement here, 
it's hacky to maintain and it is not tenable for all engines.
   > 
   > Agree, however, if we want to address this issue, we would need mechanisms 
for ignoring corrupted files that were created by zombie tasks. Which at this 
stage, is not trivial to implement.
   > 
   > At the most vanilla deployment (no MDT) of Hudi, a "VALID" base file is 
basically a file with the largest timestamp (with filegroup that is not in any 
replacecommit).
   > 
   > If we want to modify this from a high-level, we will need to modify the 
heuristics in determining what is a "VALID" basefile.
   
   Yea, agree with @voonhous. Besides, the main purpose of this pr is to stop 
task soon if it's already interrupted instead of hanging there wasting 
resources. And in somehow it could reduce a lot the possibility of partial 
files left if we can clean files at task side when it fails.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to