nsivabalan commented on code in PR #5829:
URL: https://github.com/apache/hudi/pull/5829#discussion_r894916125


##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -82,27 +82,31 @@ public List<String> getAllPartitionPaths() throws 
IOException {
       int listingParallelism = Math.min(DEFAULT_LISTING_PARALLELISM, 
pathsToList.size());
 
       // List all directories in parallel
-      List<FileStatus[]> dirToFileListing = engineContext.map(pathsToList, 
path -> {
+      List<Pair<Path, FileStatus[]>> dirToFileListing = 
engineContext.map(pathsToList, path -> {
         FileSystem fileSystem = path.getFileSystem(hadoopConf.get());
-        return fileSystem.listStatus(path);
+        return Pair.of(path, fileSystem.listStatus(path));
       }, listingParallelism);
       pathsToList.clear();
 
       // if current dictionary contains PartitionMetadata, add it to result
       // if current dictionary does not contain PartitionMetadata, add it to 
queue
-      dirToFileListing.stream().flatMap(Arrays::stream).parallel()
-          .forEach(fileStatus -> {
-            if (fileStatus.isDirectory()) {
-              if (HoodiePartitionMetadata.hasPartitionMetadata(fs, 
fileStatus.getPath())) {
-                partitionPaths.add(FSUtils.getRelativePartitionPath(new 
Path(datasetBasePath), fileStatus.getPath()));
-              } else if 
(!fileStatus.getPath().getName().equals(HoodieTableMetaClient.METAFOLDER_NAME)) 
{
-                pathsToList.add(fileStatus.getPath());
-              }
-            } else if 
(fileStatus.getPath().getName().startsWith(HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX))
 {
-              String partitionName = FSUtils.getRelativePartitionPath(new 
Path(datasetBasePath), fileStatus.getPath().getParent());
-              partitionPaths.add(partitionName);
-            }
-          });
+      dirToFileListing.forEach(p -> {
+        Option<FileStatus> partitionMetaFile = 
Option.fromJavaOptional(Arrays.stream(p.getRight()).parallel()
+            .filter(fileStatus -> 
fileStatus.getPath().getName().equals(HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX))
+            .findFirst());
+
+        if (partitionMetaFile.isPresent()) {
+          // Is a partition.
+          String partitionName = FSUtils.getRelativePartitionPath(new 
Path(datasetBasePath), p.getLeft());
+          partitionPaths.add(partitionName);
+        } else {
+          // Add sub-dirs to the queue
+          pathsToList.addAll(Arrays.stream(p.getRight())
+              .filter(fileStatus -> fileStatus.isDirectory() && 
!fileStatus.getPath().getName().equals(HoodieTableMetaClient.METAFOLDER_NAME))

Review Comment:
   I tried to incorporate this. I wanted to see if we can do this only for 
first time where we list the base path. but we have to do it within the while 
loop and so we might do it unnecessarily for every loop anyways. will go ahead 
w/ the patch for now for 0.11.1. but lets discuss and I can put up a follow up 
PR. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to