codope commented on code in PR #12174:
URL: https://github.com/apache/hudi/pull/12174#discussion_r1837079699


##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -2136,12 +2136,44 @@ public static HoodieData<HoodieRecord> 
convertFilesToPartitionStatsRecords(Hoodi
       return engineContext.emptyHoodieData();
     }
     LOG.debug("Indexing following columns for partition stats index: {}", 
columnsToIndex);
+
+    // Group by partition path and collect file names (BaseFile and LogFiles)
+    Map<String, List<String>> partitionToFileNamesMap = 
partitionInfoList.stream()
+        .collect(Collectors.toMap(
+            Pair::getKey, // Group by partition path (key of the Pair)
+            pair -> {
+              // Get the FileSlice from the pair
+              FileSlice fileSlice = pair.getValue();
+
+              // Collect BaseFile name if present
+              List<String> fileNames = new ArrayList<>();
+              fileSlice.getBaseFile().ifPresent(baseFile -> 
fileNames.add(baseFile.getFileName()));
+
+              // Collect LogFile names if present
+              fileSlice.getLogFiles()
+                  .map(HoodieLogFile::getFileName)
+                  .forEach(fileNames::add);
+
+              return fileNames;
+            },
+            // In case of duplicate keys, merge lists of file names
+            (existingList, newList) -> {
+              existingList.addAll(newList);
+              return existingList;
+            }
+        ));
+
+    // Convert the Map<String, List<String>> to List<Pair<String, 
List<String>>>
+    List<Pair<String, List<String>>> partitionFileNamePairs = 
partitionToFileNamesMap.entrySet().stream()

Review Comment:
   both engineContext.parallelize and flatMap work with list data



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to