kou commented on code in PR #38888:
URL: https://github.com/apache/arrow/pull/38888#discussion_r1407195527


##########
cpp/src/arrow/filesystem/azurefs.cc:
##########
@@ -970,6 +970,78 @@ class AzureFileSystem::Impl {
     return stream;
   }
 
+ private:
+  Status DeleteDirContentsWihtoutHierarchicalNamespace(const AzureLocation& 
location,
+                                                       bool missing_dir_ok) {
+    auto container_client =
+        blob_service_client_->GetBlobContainerClient(location.container);
+    Azure::Storage::Blobs::ListBlobsOptions options;
+    options.Prefix = internal::EnsureTrailingSlash(location.path);
+    // 
https://learn.microsoft.com/en-us/rest/api/storageservices/blob-batch#remarks
+    //
+    // Only supports up to 256 subrequests in a single batch. The
+    // size of the body for a batch request can't exceed 4 MB.
+    const int32_t kNumMaxRequestsInBatch = 256;
+    options.PageSizeHint = kNumMaxRequestsInBatch;
+    try {
+      auto list_response = container_client.ListBlobs(options);
+      if (!missing_dir_ok && list_response.Blobs.empty()) {
+        return Status::IOError("Specified directory doesn't exist: ", 
location.path, ": ",
+                               container_client.GetUrl());
+      }
+      while (list_response.HasPage() && !list_response.Blobs.empty()) {
+        auto batch = container_client.CreateBatch();
+        std::vector<Azure::Storage::DeferredResponse<
+            Azure::Storage::Blobs::Models::DeleteBlobResult>>
+            deferred_responses;
+        for (const auto& blob_item : list_response.Blobs) {
+          deferred_responses.push_back(batch.DeleteBlob(blob_item.Name));

Review Comment:
   I think that we should do it in https://github.com/apache/arrow/issues/38772 
.
   We have a test for this case. So we can detect the case when we work on 
https://github.com/apache/arrow/issues/38772 .



##########
cpp/src/arrow/filesystem/azurefs.cc:
##########
@@ -970,6 +970,78 @@ class AzureFileSystem::Impl {
     return stream;
   }
 
+ private:
+  Status DeleteDirContentsWihtoutHierarchicalNamespace(const AzureLocation& 
location,
+                                                       bool missing_dir_ok) {
+    auto container_client =
+        blob_service_client_->GetBlobContainerClient(location.container);
+    Azure::Storage::Blobs::ListBlobsOptions options;
+    options.Prefix = internal::EnsureTrailingSlash(location.path);
+    // 
https://learn.microsoft.com/en-us/rest/api/storageservices/blob-batch#remarks
+    //
+    // Only supports up to 256 subrequests in a single batch. The
+    // size of the body for a batch request can't exceed 4 MB.
+    const int32_t kNumMaxRequestsInBatch = 256;
+    options.PageSizeHint = kNumMaxRequestsInBatch;
+    try {
+      auto list_response = container_client.ListBlobs(options);
+      if (!missing_dir_ok && list_response.Blobs.empty()) {
+        return Status::IOError("Specified directory doesn't exist: ", 
location.path, ": ",
+                               container_client.GetUrl());

Review Comment:
   Good catch!



##########
cpp/src/arrow/filesystem/azurefs.cc:
##########
@@ -1017,69 +1089,67 @@ class AzureFileSystem::Impl {
             exception);
       }
     } else {
-      auto container_client =
-          blob_service_client_->GetBlobContainerClient(location.container);
-      Azure::Storage::Blobs::ListBlobsOptions options;
-      options.Prefix = internal::EnsureTrailingSlash(location.path);
-      // 
https://learn.microsoft.com/en-us/rest/api/storageservices/blob-batch#remarks
-      //
-      // Only supports up to 256 subrequests in a single batch. The
-      // size of the body for a batch request can't exceed 4 MB.
-      const int32_t kNumMaxRequestsInBatch = 256;
-      options.PageSizeHint = kNumMaxRequestsInBatch;
+      return DeleteDirContentsWihtoutHierarchicalNamespace(location, true);
+    }
+  }
+
+  Status DeleteDirContents(const AzureLocation& location, bool missing_dir_ok) 
{
+    if (location.container.empty()) {
+      return internal::InvalidDeleteDirContents(location.all);
+    }
+    if (location.path.empty()) {
+      return internal::InvalidDeleteDirContents(location.all);

Review Comment:
   We have a special API for this case: `DeleteRootDirContents()`
   
   
https://github.com/apache/arrow/blob/63353baf1cda1d1fc7bb614ce01558c12990e073/cpp/src/arrow/filesystem/filesystem.h#L233-L249
   
   So I think that we should return an error here.



##########
cpp/src/arrow/filesystem/azurefs.cc:
##########
@@ -1017,69 +1089,67 @@ class AzureFileSystem::Impl {
             exception);
       }
     } else {
-      auto container_client =
-          blob_service_client_->GetBlobContainerClient(location.container);
-      Azure::Storage::Blobs::ListBlobsOptions options;
-      options.Prefix = internal::EnsureTrailingSlash(location.path);
-      // 
https://learn.microsoft.com/en-us/rest/api/storageservices/blob-batch#remarks
-      //
-      // Only supports up to 256 subrequests in a single batch. The
-      // size of the body for a batch request can't exceed 4 MB.
-      const int32_t kNumMaxRequestsInBatch = 256;
-      options.PageSizeHint = kNumMaxRequestsInBatch;
+      return DeleteDirContentsWihtoutHierarchicalNamespace(location, true);

Review Comment:
   OK. I'll add the comment.



##########
cpp/src/arrow/filesystem/azurefs.cc:
##########
@@ -970,6 +970,78 @@ class AzureFileSystem::Impl {
     return stream;
   }
 
+ private:
+  Status DeleteDirContentsWihtoutHierarchicalNamespace(const AzureLocation& 
location,
+                                                       bool missing_dir_ok) {
+    auto container_client =
+        blob_service_client_->GetBlobContainerClient(location.container);
+    Azure::Storage::Blobs::ListBlobsOptions options;
+    options.Prefix = internal::EnsureTrailingSlash(location.path);
+    // 
https://learn.microsoft.com/en-us/rest/api/storageservices/blob-batch#remarks
+    //
+    // Only supports up to 256 subrequests in a single batch. The
+    // size of the body for a batch request can't exceed 4 MB.
+    const int32_t kNumMaxRequestsInBatch = 256;
+    options.PageSizeHint = kNumMaxRequestsInBatch;
+    try {
+      auto list_response = container_client.ListBlobs(options);
+      if (!missing_dir_ok && list_response.Blobs.empty()) {
+        return Status::IOError("Specified directory doesn't exist: ", 
location.path, ": ",
+                               container_client.GetUrl());
+      }
+      while (list_response.HasPage() && !list_response.Blobs.empty()) {

Review Comment:
   Both of `HasPage()` and `!Blobs.empty()` checks are needed to avoid empty 
`SubmitBatch()` request.
   If there are no blobs under the given path, `ListBlobs()` returns `HasPage() 
== true` and `Blobs.empty() == true` response.
   
   I'll use `for` as you suggested. I also found an azure-sdk-for-cpp document 
that uses `for`: 
https://github.com/Azure/azure-sdk-for-cpp/blob/main/sdk/storage/MigrationGuide.md#listing-blobs-in-a-container



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to