danny0405 commented on code in PR #12105:
URL: https://github.com/apache/hudi/pull/12105#discussion_r1815874111
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -840,12 +840,17 @@ private Map<String, String>
reverseLookupSecondaryKeys(String partitionName, Lis
}
Set<String> keySet = new TreeSet<>(recordKeys);
+ Set<String> deletedRecordsFromLogs = new HashSet<>();
Map<String, HoodieRecord<HoodieMetadataPayload>> logRecordsMap = new
HashMap<>();
logRecordScanner.getRecords().forEach(record -> {
HoodieMetadataPayload payload = record.getData();
- String recordKey = payload.getRecordKeyFromSecondaryIndex();
- if (keySet.contains(recordKey)) {
- logRecordsMap.put(recordKey, record);
+ if (!payload.isDeleted()) { // process only valid records.
Review Comment:
Should be unnecessary if the payload is marked as deleted. Can we follow the
deletion style of
`HoodieBackedTableMetadata.readFromBaseAndMergeWithLogRecords` to make the
merging mode concise.
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -931,17 +940,22 @@ private Map<String,
List<HoodieRecord<HoodieMetadataPayload>>> lookupSecondaryKe
List<String> sortedSecondaryKeys = new ArrayList<>(secondaryKeys);
secondaryKeySet.addAll(sortedSecondaryKeys);
Collections.sort(sortedSecondaryKeys);
+ Set<String> deletedRecordKeysFromLogs = new HashSet<>();
logRecordScanner.getRecords().forEach(record -> {
HoodieMetadataPayload payload = record.getData();
- String secondaryKey = payload.key;
- if (secondaryKeySet.contains(secondaryKey)) {
- String recordKey = payload.getRecordKeyFromSecondaryIndex();
- logRecordsMap.computeIfAbsent(secondaryKey, k -> new
HashMap<>()).put(recordKey, record);
+ if (!payload.isDeleted()) {
+ String secondaryKey = payload.key;
Review Comment:
Can we follow the deletion style of
`HoodieBackedTableMetadata.readFromBaseAndMergeWithLogRecords` to make the
merging mode concise.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]