scxwhite commented on code in PR #5030:
URL: https://github.com/apache/hudi/pull/5030#discussion_r965590717
##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java:
##########
@@ -123,25 +133,24 @@ public long getNumMergedRecordsInLog() {
return numMergedRecordsInLog;
}
- /**
- * Returns the builder for {@code HoodieMergedLogRecordScanner}.
- */
- public static HoodieMergedLogRecordScanner.Builder newBuilder() {
- return new Builder();
- }
-
@Override
protected void processNextRecord(HoodieRecord<? extends HoodieRecordPayload>
hoodieRecord) throws IOException {
String key = hoodieRecord.getRecordKey();
if (records.containsKey(key)) {
// Merge and store the merged record. The HoodieRecordPayload
implementation is free to decide what should be
// done when a delete (empty payload) is encountered before or after an
insert/update.
-
- HoodieRecord<? extends HoodieRecordPayload> oldRecord = records.get(key);
- HoodieRecordPayload oldValue = oldRecord.getData();
- HoodieRecordPayload combinedValue =
hoodieRecord.getData().preCombine(oldValue);
- // If combinedValue is oldValue, no need rePut oldRecord
- if (combinedValue != oldValue) {
+ HoodieRecord<? extends HoodieRecordPayload> storeRecord =
records.get(key);
+ HoodieRecordPayload storeValue = storeRecord.getData();
+ HoodieRecordPayload combinedValue;
+ // If revertLogFile = false, storeRecord is the old record.
+ // If revertLogFile = true, incoming data (hoodieRecord) is the old
record.
+ if (!revertLogFile) {
Review Comment:
> oh I see we have put in a fix here. sounds good.
>
> but does below one holds good?
>
> ```
> delta commit1: insert rec1: val1. preCombine: 2
> delta commit2: delete rec1:
> delta commit2: insert rec1: val2. preCombine: 1
> ```
>
> as per master, guess final snapshot will return val2 for rec1. and not the
deleted one.
>
> can you tell me what will happen w/ this patch where in we reverse the
ordering.
Thank you very much for your patient reply. In your case, your understanding
is correct. In this patch, an incorrect result (rec1: val1. preCombine: 2) will
be generated. In my usage scenario, there are only operations of adding and
updating, not deleting. Therefore, this patch may have some usage conditions.
Such as no deletion, no duplicate data, etc.
In addition,If we encounter deletion during compression, we may be able to
solve it in other ways. For example, when reading the delta logs in reverse
order, if the storeRecord already exists and is deleted during the current
operation, we can mark the storeRecord (deleteoperator = true) and ignore the
current delta record. Finally, it is found in combineandgetupdatevalue that the
incremental record is marked as true (deleteoperator = true), and the final
delta record is returned directly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]