nsivabalan commented on code in PR #13830:
URL: https://github.com/apache/hudi/pull/13830#discussion_r2322640955


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##########
@@ -619,9 +621,10 @@ public static <R> HoodieData<HoodieRecord<R>> 
tagGlobalLocationBackToRecords(
           Option<HoodieRecordGlobalLocation> currentLocOpt = 
Option.ofNullable(v.getRight().orElse(null));
           if (currentLocOpt.isPresent()) {
             HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
-            boolean shouldDoMergedLookUpThenTag = mayContainDuplicateLookup
-                || !Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath());
-            if (shouldUpdatePartitionPath && shouldDoMergedLookUpThenTag) {
+            boolean shouldDoMergedLookUpThenTag = mayContainDuplicateLookup // 
handle event time ordering updates
+                || shouldUpdatePartitionPath && 
!Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath())  // handle partition updates
+                || (!isCommitTimeOrdered && 
incomingRecord.isDelete(writerSchema.get(), properties)); // handle event time 
ordering deletes

Review Comment:
   after this patch, even if `updatePartitionPath` is false, and for COW table 
w/ event time ordering, for incoming delete records, we are merging w/ existing 
records here i.e incurring I/O or merge cost.
   
   which was not the case before. is my understanding right?
   So, how were deletes w/ lower ordering values (above case) was working in 
0.x? Or we did not have any tests for this scenario only. 
   
   atleast for COW, when `updatePartitionPath` is false, we will be routing the 
incoming records to erge handle and merge handle should be taking care of lower 
ordering value reconciliation right? 
   
   since this entire global index is very complex, may be I am missing to 
connect few things. 
   can you help clarify this. 
   
   



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##########
@@ -619,9 +621,10 @@ public static <R> HoodieData<HoodieRecord<R>> 
tagGlobalLocationBackToRecords(
           Option<HoodieRecordGlobalLocation> currentLocOpt = 
Option.ofNullable(v.getRight().orElse(null));
           if (currentLocOpt.isPresent()) {
             HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
-            boolean shouldDoMergedLookUpThenTag = mayContainDuplicateLookup
-                || !Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath());
-            if (shouldUpdatePartitionPath && shouldDoMergedLookUpThenTag) {
+            boolean shouldDoMergedLookUpThenTag = mayContainDuplicateLookup // 
handle event time ordering updates
+                || shouldUpdatePartitionPath && 
!Objects.equals(incomingRecord.getPartitionPath(), 
currentLoc.getPartitionPath())  // handle partition updates
+                || (!isCommitTimeOrdered && 
incomingRecord.isDelete(writerSchema.get(), properties)); // handle event time 
ordering deletes

Review Comment:
   may be we should take up the opportunity to add some detailed java docs to 
`tagGlobalLocationBackToRecords` on diff branching we have and necessity for 
some of them. 
   everytime, a reviewer has to spend 10 mins or so to go over entire code 
block to understand diff branches and dependencies. 
   not really a blocker comment, but would be good to add them so that, its 
easier for maintenance going forward. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to