wombatu-kun commented on code in PR #19023:
URL: https://github.com/apache/hudi/pull/19023#discussion_r3426558503
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/TransactionUtils.java:
##########
@@ -141,6 +144,28 @@ public static Option<Pair<HoodieInstant, Map<String,
String>>> getLastCompletedT
return getHoodieInstantAndMetaDataPair(metaClient, hoodieInstantOption);
}
+ /**
+ * Get the last completed transaction hoodie instant before the given
instant time.
+ * The returned instant has both requested time and completion time less
than the given instant time,
+ * ensuring it was fully completed before the given instant was created.
+ *
+ * @param metaClient table meta client
+ * @param currentInstantTime the requested time of the current inflight
instant
+ * @return the last completed instant before the given instant, with its
extra metadata
+ */
+ public static Option<Pair<HoodieInstant, Map<String, String>>>
getLastCompletedTxnInstantAndMetadata(
+ HoodieTableMetaClient metaClient, String currentInstantTime) {
+ Option<HoodieInstant> hoodieInstantOption = Option.fromJavaOptional(
+ metaClient.getActiveTimeline().getCommitsTimeline()
+ .filterCompletedInstants()
+ .findInstantsBefore(currentInstantTime)
+ .getInstantsAsStream()
+ .filter(instant -> instant.getCompletionTime() != null
+ && compareTimestamps(instant.getCompletionTime(), LESSER_THAN,
currentInstantTime))
+ .max(Comparator.comparing(HoodieInstant::getCompletionTime)));
Review Comment:
On the comparator: max requestedTime (among completed instants with
completionTime < currentInstantTime) is the right choice for this PR.
getCandidateInstantsV8AndAbove cuts the candidate set with
findInstantsAfter(lastSuccessful.requestedTime()), and the pre-V8 branch
(getCommitsCompletedSinceLastCommit) does the same, so keying the baseline on
requestedTime keeps the two consistent and avoids leaving an
already-completed-but-later-requested instant in the candidate set. It also
matches the existing baseline path: preWrite uses the single-arg
getLastCompletedTxnInstantAndMetadata, i.e. lastInstant() over the completed
commits timeline.
On moving the fix into the OCC strategy (filter purely by completion time,
drop pendingInflightAndRequestedInstants / timelineRefreshedWithinTransaction):
the direction is cleaner, but resolveWriteConflictIfAny is the shared path for
the Spark and Java writers too, and completion time is not reliably populated
pre-V8 (the new method already has to null-guard getCompletionTime), so it
would need version-gating plus broader cross-engine testing. My suggestion is
to keep this PR to the minimal requestedTime fix and track the strategy rework
as a separate PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]