Re: [PR] [HUDI-8648] Fix a bug for secondary index deletion [hudi]

via GitHub Wed, 11 Dec 2024 15:17:33 -0800


nsivabalan commented on code in PR #12447:
URL: https://github.com/apache/hudi/pull/12447#discussion_r1881147804



##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -919,4 +886,42 @@ private Map<String, HoodieRecord<HoodieMetadataPayload>> 
fetchBaseFileAllRecords
       return 
SecondaryIndexKeyUtils.getRecordKeyFromSecondaryIndexKey(record.getRecordKey());
     }, record -> record));
   }
+
+  @VisibleForTesting
+  public static Map<String, String> 
reverseLookupSecondaryKeysInternal(List<String> recordKeys,
+                                                                       
Map<String, HoodieRecord<HoodieMetadataPayload>> baseFileRecords,
+                                                                       
HoodieMetadataLogRecordReader logRecordScanner) {
+    Map<String, String> recordKeyMap = new HashMap<>();
+    Set<String> keySet = new TreeSet<>(recordKeys);
+    Set<String> deletedRecordsFromLogs = new HashSet<>();
+    Map<String, HoodieRecord<HoodieMetadataPayload>> logRecordsMap = new 
HashMap<>();
+    // Note that: we read the log records from the oldest to the latest!!!
+    // If we change the read order, we need update the following logic 
accordingly.
+    logRecordScanner.getRecords().forEach(record -> {
+      String recordKey = 
SecondaryIndexKeyUtils.getRecordKeyFromSecondaryIndexKey(record.getRecordKey());
+      HoodieMetadataPayload payload = record.getData();
+      if (!payload.isDeleted()) { // process only valid records.
+        if (keySet.contains(recordKey)) {
+          logRecordsMap.put(recordKey, record);
+        }
+      } else {
+        // When and Only when the latest log record is non-tombstone, 
logRecordMap contains its recordKey.
+        logRecordsMap.remove(recordKey);
+        deletedRecordsFromLogs.add(recordKey);
+      }
+    });
+
+    // Return non-tombstone records from the log files.
+    logRecordsMap.forEach((key, value) -> recordKeyMap.put(
+        key, 
SecondaryIndexKeyUtils.getSecondaryKeyFromSecondaryIndexKey(value.getRecordKey())));
+    // Return non-tombstone records from the base file.
+    if (baseFileRecords != null) {
+      baseFileRecords.forEach((key, value) -> {
+        if (!deletedRecordsFromLogs.contains(key)) {
+          recordKeyMap.put(key, 
SecondaryIndexKeyUtils.getSecondaryKeyFromSecondaryIndexKey(value.getRecordKey()));

Review Comment:
   why we are not merging the records from base to log records?  I understand 
in case of secondary index, a record is either created or deleted and there is 
no real merge, but lets keep the original code as is and just add fixes on top 
of that. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8648] Fix a bug for secondary index deletion [hudi]

Reply via email to