guanziyue edited a comment on pull request #4444: URL: https://github.com/apache/hudi/pull/4444#issuecomment-1044857803
Try to think it over again. We may find that log file is not always fail-safe as we expected. So we may need more to make it correct: option 1: still focus on this problem, we can treat such log file as data file. They are totally in same role. We can delete those partial generated log files when hudi use marker file clear data file. However, currently marker file haven't cover each log file when rollover happen. option 2. Having a place to get correct results of log writing outside log file it self. For example, meta table. We may find that commit meta is not a good choice. Append results can be archived before compaction of the corresponding fileGroup is triggered. And commit meta is not designed for query on fileGroup dimension. We may firstly have a this viable option and totally fix it when meta table is universally used. option 3. Having a mechanism to make log block fail-safe. I want to make log blocks written exactly sorted with commit time (actually it is can be unsorted now). And then I would like to write a defensive rollback block to rollback any failed task attempt for this commit and then start writing. It may lead to a perf problem as it need file listing. Till now, that is all I can come up with. Hope more ideas or suggestion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
