wombatu-kun commented on code in PR #19023:
URL: https://github.com/apache/hudi/pull/19023#discussion_r3426558503


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/TransactionUtils.java:
##########
@@ -141,6 +144,28 @@ public static Option<Pair<HoodieInstant, Map<String, 
String>>> getLastCompletedT
     return getHoodieInstantAndMetaDataPair(metaClient, hoodieInstantOption);
   }
 
+  /**
+   * Get the last completed transaction hoodie instant before the given 
instant time.
+   * The returned instant has both requested time and completion time less 
than the given instant time,
+   * ensuring it was fully completed before the given instant was created.
+   *
+   * @param metaClient table meta client
+   * @param currentInstantTime the requested time of the current inflight 
instant
+   * @return the last completed instant before the given instant, with its 
extra metadata
+   */
+  public static Option<Pair<HoodieInstant, Map<String, String>>> 
getLastCompletedTxnInstantAndMetadata(
+      HoodieTableMetaClient metaClient, String currentInstantTime) {
+    Option<HoodieInstant> hoodieInstantOption = Option.fromJavaOptional(
+        metaClient.getActiveTimeline().getCommitsTimeline()
+            .filterCompletedInstants()
+            .findInstantsBefore(currentInstantTime)
+            .getInstantsAsStream()
+            .filter(instant -> instant.getCompletionTime() != null
+                && compareTimestamps(instant.getCompletionTime(), LESSER_THAN, 
currentInstantTime))
+            .max(Comparator.comparing(HoodieInstant::getCompletionTime)));

Review Comment:
   On the comparator: max requestedTime (among completed instants with 
completionTime < currentInstantTime) is the right choice for this PR. 
getCandidateInstantsV8AndAbove cuts the candidate set with 
findInstantsAfter(lastSuccessful.requestedTime()), and the pre-V8 branch 
(getCommitsCompletedSinceLastCommit) does the same, so keying the baseline on 
requestedTime keeps the two consistent and avoids leaving an 
already-completed-but-later-requested instant in the candidate set. It also 
matches the existing baseline path: preWrite uses the single-arg 
getLastCompletedTxnInstantAndMetadata, i.e. lastInstant() over the completed 
commits timeline.
   
   On moving the fix into the OCC strategy (filter purely by completion time, 
drop pendingInflightAndRequestedInstants / timelineRefreshedWithinTransaction): 
the direction is cleaner, but resolveWriteConflictIfAny is the shared path for 
the Spark and Java writers too, and completion time is not reliably populated 
pre-V8 (the new method already has to null-guard getCompletionTime), so it 
would need version-gating plus broader cross-engine testing. My suggestion is 
to keep this PR to the minimal requestedTime fix and track the strategy rework 
as a separate PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to