TheR1sing3un commented on code in PR #12344:
URL: https://github.com/apache/hudi/pull/12344#discussion_r1875745898


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/ConsistentBucketIndexUtils.java:
##########
@@ -117,55 +116,74 @@ public static Option<HoodieConsistentHashingMetadata> 
loadMetadata(HoodieTable t
         return filename.contains(HASHING_METADATA_FILE_SUFFIX);
       };
       final List<StoragePathInfo> metaFiles = 
metaClient.getStorage().listDirectEntries(metadataPath);
-      final TreeSet<String> commitMetaTss = 
metaFiles.stream().filter(hashingMetaCommitFilePredicate)
-          .map(commitFile -> 
HoodieConsistentHashingMetadata.getTimestampFromFile(commitFile.getPath().getName()))
-          .sorted()
-          .collect(Collectors.toCollection(TreeSet::new));
-      final List<StoragePathInfo> hashingMetaFiles = 
metaFiles.stream().filter(hashingMetadataFilePredicate)
-          .sorted(Comparator.comparing(f -> f.getPath().getName()))
+
+      final TreeMap<String/*instantTime*/, Pair<StoragePathInfo/*hash metadata 
file path*/, Boolean/*commited*/>> versionedHashMetadataFiles = 
metaFiles.stream()
+          .filter(hashingMetadataFilePredicate)
+          .map(metaFile -> {
+            String instantTime = 
HoodieConsistentHashingMetadata.getTimestampFromFile(metaFile.getPath().getName());
+            return Pair.of(instantTime, Pair.of(metaFile, false));
+          })
+          .sorted(Collections.reverseOrder())
+          .collect(Collectors.toMap(Pair::getLeft, Pair::getRight, (a, b) -> 
a, TreeMap::new));
+
+      metaFiles.stream().filter(hashingMetaCommitFilePredicate)
+          .forEach(commitFile -> {
+            String instantTime = 
HoodieConsistentHashingMetadata.getTimestampFromFile(commitFile.getPath().getName());
+            if (!versionedHashMetadataFiles.containsKey(instantTime)) {

Review Comment:
   > Can we just fix the metadata path and keep the other logic as it is so 
that it is more easier to review, if you wanna a refactoring to the code, let's 
fire another PR to address it.
   
   I originally intended to do this like u said, but found that fixing the 
metadata path code exposed the logic of the original code, assuming that there 
is now an inflight clustering, when loading metadata, according to the original 
code, it will return empty, which is not expected. Therefore, I fixed these two 
problems at the same time, and it was difficult to fix them separately, because 
the original unit-test would be failed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to