n3nash commented on a change in pull request #2359:
URL: https://github.com/apache/hudi/pull/2359#discussion_r549981815



##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java
##########
@@ -165,13 +167,17 @@ public boolean archiveIfRequired(HoodieEngineContext 
context) throws IOException
     HoodieTimeline commitTimeline = table.getCompletedCommitsTimeline();
     Option<HoodieInstant> oldestPendingCompactionInstant =
         
table.getActiveTimeline().filterPendingCompactionTimeline().firstInstant();
+    Option<HoodieInstant> oldestInflightInstant =
+        table.getActiveTimeline()
+            
.getTimelineOfActions(CollectionUtils.createSet(HoodieTimeline.COMMIT_ACTION, 
HoodieTimeline.DELTA_COMMIT_ACTION))

Review comment:
       Yes, the retry of both will use the same instant right now. All of the 
lazy cleaning is only done for commits/deltacommits since there is contention 
when multiple writers start running. Both compaction and clustering will 
continue to do inline rollbacks since both these operations are bound to happen 
once scheduled. If the compaction or clustering DON'T run, the failed writes 
will keep lying around. Basically, once a compaction/clustering is scheduled, 
we ensure no other writer can modify the same set of files, there is no 
contention for the failed writes and hence the cleanup happens inline. If they 
don't happen inline and depend on the LAZY cleaning, then since we re-use the 
same instant, we cannot guarantee correctness. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to