Copilot commented on code in PR #17294:
URL: https://github.com/apache/pinot/pull/17294#discussion_r2578579062
##########
pinot-plugins/pinot-file-system/pinot-hdfs/src/main/java/org/apache/pinot/plugin/filesystem/HadoopPinotFS.java:
##########
@@ -88,6 +88,83 @@ public boolean delete(URI segmentUri, boolean forceDelete)
return _hadoopFS.delete(new Path(segmentUri), true);
}
+ @Override
+ public boolean deleteBatch(List<URI> segmentUris, boolean forceDelete)
+ throws IOException {
+ if (segmentUris == null || segmentUris.isEmpty()) {
+ return true;
+ }
+
+ boolean result = true;
+ List<Path> pathsToDelete = new ArrayList<>();
+
+ LOGGER.info("Deleting batch of {} URIs, forceDelete: {}",
segmentUris.size(), forceDelete);
+
+ for (URI segmentUri : segmentUris) {
+ try {
+ Path path = new Path(segmentUri);
+
+ // Check if path exists before attempting deletion
+ if (!_hadoopFS.exists(path)) {
+ LOGGER.debug("Path {} does not exist, skipping deletion",
segmentUri);
+ continue;
+ }
+
+ if (isDirectory(segmentUri)) {
+ if (!forceDelete) {
+ // Check if directory is empty
+ if (listFiles(segmentUri, false).length > 0) {
+ LOGGER.warn("Directory {} is not empty and forceDelete is false,
skipping", segmentUri);
+ result = false;
+ continue;
+ }
+ }
+ // For directories, recursively collect all files
+ collectFilesRecursively(path, pathsToDelete);
+ } else {
+ pathsToDelete.add(path);
+ }
+ } catch (IOException e) {
+ LOGGER.warn("Error processing path {} for batch deletion", segmentUri,
e);
+ result = false;
+ }
+ }
+
+ // Perform batch deletion
+ if (!pathsToDelete.isEmpty()) {
+ LOGGER.info("Deleting {} paths in batch", pathsToDelete.size());
+ for (Path path : pathsToDelete) {
+ try {
+ if (!_hadoopFS.delete(path, true)) {
+ LOGGER.warn("Failed to delete path: {}", path);
+ result = false;
+ }
+ } catch (IOException e) {
+ LOGGER.warn("Error deleting path: {}", path, e);
+ result = false;
+ }
+ }
Review Comment:
The batch deletion implementation still deletes files one by one in a loop,
negating the performance benefit of batch operations. Consider using Hadoop's
bulk delete APIs if available, or at least parallelize the deletion operations
using ExecutorService to improve performance for large batches.
##########
pinot-plugins/pinot-file-system/pinot-hdfs/src/main/java/org/apache/pinot/plugin/filesystem/HadoopPinotFS.java:
##########
@@ -88,6 +88,83 @@ public boolean delete(URI segmentUri, boolean forceDelete)
return _hadoopFS.delete(new Path(segmentUri), true);
}
+ @Override
+ public boolean deleteBatch(List<URI> segmentUris, boolean forceDelete)
+ throws IOException {
+ if (segmentUris == null || segmentUris.isEmpty()) {
+ return true;
+ }
+
+ boolean result = true;
+ List<Path> pathsToDelete = new ArrayList<>();
+
+ LOGGER.info("Deleting batch of {} URIs, forceDelete: {}",
segmentUris.size(), forceDelete);
+
+ for (URI segmentUri : segmentUris) {
+ try {
+ Path path = new Path(segmentUri);
+
+ // Check if path exists before attempting deletion
+ if (!_hadoopFS.exists(path)) {
+ LOGGER.debug("Path {} does not exist, skipping deletion",
segmentUri);
+ continue;
+ }
+
+ if (isDirectory(segmentUri)) {
+ if (!forceDelete) {
+ // Check if directory is empty
+ if (listFiles(segmentUri, false).length > 0) {
+ LOGGER.warn("Directory {} is not empty and forceDelete is false,
skipping", segmentUri);
+ result = false;
+ continue;
+ }
Review Comment:
When `forceDelete` is false and the directory is empty, the code skips
`collectFilesRecursively()` but should still add the empty directory to
`pathsToDelete` for deletion. Currently, empty directories are not deleted when
`forceDelete` is false.
```suggestion
}
// Directory is empty, add it for deletion
pathsToDelete.add(path);
continue;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]