guanziyue edited a comment on pull request #4444:
URL: https://github.com/apache/hudi/pull/4444#issuecomment-1044857803


   Try to think it over again.  We may find that log file is not always 
fail-safe as we expected. So we may need more to make it correct:
   option 1: still focus on this problem, we can treat such log file as data 
file. They are totally in same role. We can delete those partial generated log 
files when hudi use marker file clear data file. However, currently marker file 
haven't cover each log file when rollover happen.
   option 2. Having a place to get correct results of log writing outside log 
file it self. For example, meta table. We may find that commit meta is not a 
good choice. Append results can be archived before compaction of the 
corresponding fileGroup is triggered. And commit meta is not designed for query 
on fileGroup dimension. 
   We may firstly have a this viable option and totally fix it when meta table 
is universally used.
   option 3. Having a mechanism to make log block fail-safe. I want to make log 
blocks written exactly sorted with commit time (actually it is can be unsorted 
now). And then I would like to write a defensive rollback block to rollback any 
failed task attempt for this commit and then start writing. It may lead to a 
perf problem as it need file listing.
   Till now, that is all I can come up with. Hope more ideas or suggestion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to