hudi-bot opened a new issue, #16243:
URL: https://github.com/apache/hudi/issues/16243

   The current algorithm take two passes over the log blocks: # First pass to 
collect all the valid blocks alongwith block instant times including rollback 
block's target instant time.
    # Second pass, in rever order of block instant time, to track final 
compacted instant times for each block.
   
   Now that we have removed appending to the same log file for multiple 
deltacommits, we can probably scan in single pass by keeping an active list or 
hash map of block times to their corresponding block, updating as we go. Should 
be tested for:
    # Out of order merged blocks: Log compaction is scheduled and by the time 
it appended a block, another block is added by another writer.
    # Log compaction operation failed, so a rollback is issued for this block. 
Here the rollback can be next block or can come at a later point of time.
    # Log compaction is executing and, before committing, compaction starts 
running on the same file group.
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-6888
   - Type: Task
   - Epic: https://issues.apache.org/jira/browse/HUDI-3580
   - Fix version(s):
     - 1.1.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to