vinothchandar commented on a change in pull request #4400:
URL: https://github.com/apache/hudi/pull/4400#discussion_r772751429
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##########
@@ -264,8 +264,11 @@ HoodieCompactionPlan generateCompactionPlan(
.getLatestFileSlices(partitionPath)
.filter(slice ->
!fgIdsInPendingCompactionAndClustering.contains(slice.getFileGroupId()))
.map(s -> {
+ // We can think that the latest data is in the latest delta log
file, so we sort it from large
Review comment:
I think you are assuming the later writes in the log always overwrites
the earlier ones? this is not true always.
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java
##########
@@ -142,9 +142,11 @@ protected void processNextRecord(HoodieRecord<? extends
HoodieRecordPayload> hoo
HoodieRecord<? extends HoodieRecordPayload> oldRecord = records.get(key);
HoodieRecordPayload oldValue = oldRecord.getData();
HoodieRecordPayload combinedValue =
hoodieRecord.getData().preCombine(oldValue);
- boolean choosePrev = combinedValue.equals(oldValue);
- HoodieOperation operation = choosePrev ? oldRecord.getOperation() :
hoodieRecord.getOperation();
- records.put(key, new HoodieRecord<>(new HoodieKey(key,
hoodieRecord.getPartitionPath()), combinedValue, operation));
+ // If combinedValue is oldValue, no need rePut oldRecord
+ if (!combinedValue.equals(oldValue)) {
Review comment:
This feels like a valid optimization.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]