nsivabalan commented on a change in pull request #3233:
URL: https://github.com/apache/hudi/pull/3233#discussion_r683052454



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
##########
@@ -601,4 +604,84 @@ public static HoodieWrapperFileSystem getFs(String path, 
SerializableConfigurati
         .filter(fileStatus -> 
!fileStatus.getPath().toString().contains(HoodieTableMetaClient.METAFOLDER_NAME))
         .collect(Collectors.toList());
   }
+
+  /**
+   * Deletes a directory by deleting sub-paths in parallel on the file system.
+   *
+   * @param hoodieEngineContext {@code HoodieEngineContext} instance
+   * @param fs file system
+   * @param dirPath directory path
+   * @param parallelism parallelism to use for sub-paths
+   * @return {@code true} if the directory is delete; {@code false} otherwise.
+   */
+  public static boolean deleteDir(
+      HoodieEngineContext hoodieEngineContext, FileSystem fs, Path dirPath, 
int parallelism) {
+    try {
+      if (fs.exists(dirPath)) {
+        FSUtils.parallelizeSubPathProcess(hoodieEngineContext, fs, dirPath, 
parallelism, e -> true,
+            pairOfSubPathAndConf -> 
deleteSubPath(pairOfSubPathAndConf.getKey(), pairOfSubPathAndConf.getValue())
+        );
+
+        boolean result = fs.delete(dirPath, true);
+        LOG.info("Removed directory at " + dirPath);
+        return result;
+      }
+    } catch (IOException ioe) {
+      throw new HoodieIOException(ioe.getMessage(), ioe);
+    }
+    return false;
+  }
+
+  /**
+   * Processes sub-path in parallel.
+   *
+   * @param hoodieEngineContext {@code HoodieEngineContext} instance
+   * @param fs file system
+   * @param dirPath directory path
+   * @param parallelism parallelism to use for sub-paths
+   * @param subPathPredicate predicate to use to filter sub-paths for 
processing
+   * @param pairFunction actual processing logic for each sub-path
+   * @param <T> type of result to return for each sub-path
+   * @return a map of sub-path to result of the processing
+   */
+  public static <T> Map<String, T> parallelizeSubPathProcess(

Review comment:
       looks neat :) 

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
##########
@@ -601,4 +604,84 @@ public static HoodieWrapperFileSystem getFs(String path, 
SerializableConfigurati
         .filter(fileStatus -> 
!fileStatus.getPath().toString().contains(HoodieTableMetaClient.METAFOLDER_NAME))
         .collect(Collectors.toList());
   }
+
+  /**
+   * Deletes a directory by deleting sub-paths in parallel on the file system.
+   *
+   * @param hoodieEngineContext {@code HoodieEngineContext} instance
+   * @param fs file system
+   * @param dirPath directory path
+   * @param parallelism parallelism to use for sub-paths
+   * @return {@code true} if the directory is delete; {@code false} otherwise.
+   */
+  public static boolean deleteDir(
+      HoodieEngineContext hoodieEngineContext, FileSystem fs, Path dirPath, 
int parallelism) {
+    try {
+      if (fs.exists(dirPath)) {
+        FSUtils.parallelizeSubPathProcess(hoodieEngineContext, fs, dirPath, 
parallelism, e -> true,
+            pairOfSubPathAndConf -> 
deleteSubPath(pairOfSubPathAndConf.getKey(), pairOfSubPathAndConf.getValue())
+        );
+
+        boolean result = fs.delete(dirPath, true);
+        LOG.info("Removed directory at " + dirPath);
+        return result;
+      }
+    } catch (IOException ioe) {
+      throw new HoodieIOException(ioe.getMessage(), ioe);
+    }
+    return false;
+  }
+
+  /**
+   * Processes sub-path in parallel.
+   *
+   * @param hoodieEngineContext {@code HoodieEngineContext} instance
+   * @param fs file system
+   * @param dirPath directory path
+   * @param parallelism parallelism to use for sub-paths
+   * @param subPathPredicate predicate to use to filter sub-paths for 
processing
+   * @param pairFunction actual processing logic for each sub-path
+   * @param <T> type of result to return for each sub-path
+   * @return a map of sub-path to result of the processing
+   */
+  public static <T> Map<String, T> parallelizeSubPathProcess(

Review comment:
       Do we have unit tests for these newly added methods. Other devs might 
possible start using it in future. So, lets ensure we have good test coverage. 

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
##########
@@ -601,4 +604,84 @@ public static HoodieWrapperFileSystem getFs(String path, 
SerializableConfigurati
         .filter(fileStatus -> 
!fileStatus.getPath().toString().contains(HoodieTableMetaClient.METAFOLDER_NAME))
         .collect(Collectors.toList());
   }
+
+  /**
+   * Deletes a directory by deleting sub-paths in parallel on the file system.
+   *
+   * @param hoodieEngineContext {@code HoodieEngineContext} instance
+   * @param fs file system
+   * @param dirPath directory path
+   * @param parallelism parallelism to use for sub-paths
+   * @return {@code true} if the directory is delete; {@code false} otherwise.
+   */
+  public static boolean deleteDir(

Review comment:
       since this is refactoring even existing DirectMaker files path, we just 
need to be mindful of any regression. We can rely on CI test result. but just a 
reminder. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to