nsivabalan commented on code in PR #13830:
URL: https://github.com/apache/hudi/pull/13830#discussion_r2322640955
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##########
@@ -619,9 +621,10 @@ public static <R> HoodieData<HoodieRecord<R>>
tagGlobalLocationBackToRecords(
Option<HoodieRecordGlobalLocation> currentLocOpt =
Option.ofNullable(v.getRight().orElse(null));
if (currentLocOpt.isPresent()) {
HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
- boolean shouldDoMergedLookUpThenTag = mayContainDuplicateLookup
- || !Objects.equals(incomingRecord.getPartitionPath(),
currentLoc.getPartitionPath());
- if (shouldUpdatePartitionPath && shouldDoMergedLookUpThenTag) {
+ boolean shouldDoMergedLookUpThenTag = mayContainDuplicateLookup //
handle event time ordering updates
+ || shouldUpdatePartitionPath &&
!Objects.equals(incomingRecord.getPartitionPath(),
currentLoc.getPartitionPath()) // handle partition updates
+ || (!isCommitTimeOrdered &&
incomingRecord.isDelete(writerSchema.get(), properties)); // handle event time
ordering deletes
Review Comment:
after this patch, even if `updatePartitionPath` is false, and for COW table
w/ event time ordering, for incoming delete records, we are merging w/ existing
records here i.e incurring I/O or merge cost.
which was not the case before. is my understanding right?
So, how were deletes w/ lower ordering values (above case) was working in
0.x? Or we did not have any tests for this scenario only.
atleast for COW, when `updatePartitionPath` is false, we will be routing the
incoming records to erge handle and merge handle should be taking care of lower
ordering value reconciliation right?
since this entire global index is very complex, may be I am missing to
connect few things.
can you help clarify this.
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##########
@@ -619,9 +621,10 @@ public static <R> HoodieData<HoodieRecord<R>>
tagGlobalLocationBackToRecords(
Option<HoodieRecordGlobalLocation> currentLocOpt =
Option.ofNullable(v.getRight().orElse(null));
if (currentLocOpt.isPresent()) {
HoodieRecordGlobalLocation currentLoc = currentLocOpt.get();
- boolean shouldDoMergedLookUpThenTag = mayContainDuplicateLookup
- || !Objects.equals(incomingRecord.getPartitionPath(),
currentLoc.getPartitionPath());
- if (shouldUpdatePartitionPath && shouldDoMergedLookUpThenTag) {
+ boolean shouldDoMergedLookUpThenTag = mayContainDuplicateLookup //
handle event time ordering updates
+ || shouldUpdatePartitionPath &&
!Objects.equals(incomingRecord.getPartitionPath(),
currentLoc.getPartitionPath()) // handle partition updates
+ || (!isCommitTimeOrdered &&
incomingRecord.isDelete(writerSchema.get(), properties)); // handle event time
ordering deletes
Review Comment:
may be we should take up the opportunity to add some detailed java docs to
`tagGlobalLocationBackToRecords` on diff branching we have and necessity for
some of them.
everytime, a reviewer has to spend 10 mins or so to go over entire code
block to understand diff branches and dependencies.
not really a blocker comment, but would be good to add them so that, its
easier for maintenance going forward.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]