n3nash commented on pull request #2359:
URL: https://github.com/apache/hudi/pull/2359#issuecomment-781149657


   @nsivabalan Addressed your comments PTAL. @vinothchandar This PR is ready. 
Following is the summary of outstanding items / previous questions. 
   I have tested this PR against the test plan discussed. I have not been able 
to test against async clustering / compaction. This should be technically fine 
since there are no code changes to the rollback of their path. Let me know if 
you need any further clarification, we can do a quick call to resolve and land 
this PR.
   
   > > The concern I had was the part 2 where, a committed write could have 
been archived and we may end up skipping it. Can you please clarify again how 
we guard that? By ensuring the archival will wait for the cleaner to log this 
block?
   > 
   > @vinothchandar I have made 2 changes for multi-writer.
   > 
   > 1. I changed the logic in `HoodieTimelineArchiveLog` to NOT archive 
anything after the oldest inflight. For long running jobs, we want to make sure 
that all instants that happened after it are present and not archived so 
conflict resolution can be done correctly.
   > 2. I have added a check in log scanning code that log blocks instant 
should be either a) in the commit timeline or earlier that the earliest commit 
and b) not present in the inflight timeline.
   > 
   > Not following your ask around committed writes and skipping, can you 
elaborate ? Resolved other comments, left 1 open item.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to