n3nash commented on a change in pull request #2359:
URL: https://github.com/apache/hudi/pull/2359#discussion_r549981815
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java
##########
@@ -165,13 +167,17 @@ public boolean archiveIfRequired(HoodieEngineContext
context) throws IOException
HoodieTimeline commitTimeline = table.getCompletedCommitsTimeline();
Option<HoodieInstant> oldestPendingCompactionInstant =
table.getActiveTimeline().filterPendingCompactionTimeline().firstInstant();
+ Option<HoodieInstant> oldestInflightInstant =
+ table.getActiveTimeline()
+
.getTimelineOfActions(CollectionUtils.createSet(HoodieTimeline.COMMIT_ACTION,
HoodieTimeline.DELTA_COMMIT_ACTION))
Review comment:
Yes, the retry of both will use the same instant right now. All of the
lazy cleaning is only done for commits/deltacommits since there is contention
when multiple writers start running. Both compaction and clustering will
continue to do inline rollbacks since both these operations are bound to happen
once scheduled. If the compaction or clustering DON'T run, the failed writes
will keep lying around. Basically, once a compaction/clustering is scheduled,
we ensure no other writer can modify the same set of files, there is no
contention for the failed writes and hence the cleanup happens inline. If they
don't happen inline and depend on the LAZY cleaning, then since we re-use the
same instant, we cannot guarantee correctness.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]