prashantwason commented on PR #9545:
URL: https://github.com/apache/hudi/pull/9545#issuecomment-1702595033

   @nsivabalan Did you consider adding a new command block type which could 
work as a commit marker? 
   
   Lets assume commit C1 was to add 2 log blocks to a log file. Lets assume the 
log file already has the following content (I am assuming appends enabled on 
the log file for simplicity here but this should work with append disabled too).
   Current log file: [log_block_c0_1]
   
   So now commit C1 will add 2 log blocks resulting in:
   Current log file: [log_block_c0_1, log_block_c1_1, log_block_c1_2] 
   
   The issue you have is that if the Spark stage retries lead to repeated 
writes of log_block_c1_1, log_block_c1_2. 
   
   Lets assume that all writes of log blocks should end with a valid commit 
command block:
   
   log file: [log_block_c0_1, COMMIT_COMMAND_BLOCK_C0, log_block_c1_1, 
log_block_c1_2, COMMIT_COMMAND_BLOCK_C1]
   
   If a valid commit command block is not found then the preceding blocks are 
not valid. If multiple  COMMIT_COMMAND_BLOCK_XX are found then the reader can 
choose the last one.
   
   This idea is similar to how databases use START_COMMIT and END_COMMIT 
markers in WAL (write-ahead-log) etc. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to