felipecrv commented on code in PR #38888:
URL: https://github.com/apache/arrow/pull/38888#discussion_r1407999501
##########
cpp/src/arrow/filesystem/azurefs.cc:
##########
@@ -970,6 +970,78 @@ class AzureFileSystem::Impl {
return stream;
}
+ private:
+ Status DeleteDirContentsWihtoutHierarchicalNamespace(const AzureLocation&
location,
+ bool missing_dir_ok) {
+ auto container_client =
+ blob_service_client_->GetBlobContainerClient(location.container);
+ Azure::Storage::Blobs::ListBlobsOptions options;
+ options.Prefix = internal::EnsureTrailingSlash(location.path);
+ //
https://learn.microsoft.com/en-us/rest/api/storageservices/blob-batch#remarks
+ //
+ // Only supports up to 256 subrequests in a single batch. The
+ // size of the body for a batch request can't exceed 4 MB.
+ const int32_t kNumMaxRequestsInBatch = 256;
+ options.PageSizeHint = kNumMaxRequestsInBatch;
+ try {
+ auto list_response = container_client.ListBlobs(options);
+ if (!missing_dir_ok && list_response.Blobs.empty()) {
+ return Status::IOError("Specified directory doesn't exist: ",
location.path, ": ",
+ container_client.GetUrl());
+ }
+ while (list_response.HasPage() && !list_response.Blobs.empty()) {
Review Comment:
> ListBlobs() returns HasPage() == true and Blobs.empty() == true response.
Is there a chance of that not being the last page? I see that you now only
`continue;` instead of breaking the entire loop which I think is the more
robust thing to do. 👍
##########
cpp/src/arrow/filesystem/azurefs.cc:
##########
@@ -1017,69 +1089,67 @@ class AzureFileSystem::Impl {
exception);
}
} else {
- auto container_client =
- blob_service_client_->GetBlobContainerClient(location.container);
- Azure::Storage::Blobs::ListBlobsOptions options;
- options.Prefix = internal::EnsureTrailingSlash(location.path);
- //
https://learn.microsoft.com/en-us/rest/api/storageservices/blob-batch#remarks
- //
- // Only supports up to 256 subrequests in a single batch. The
- // size of the body for a batch request can't exceed 4 MB.
- const int32_t kNumMaxRequestsInBatch = 256;
- options.PageSizeHint = kNumMaxRequestsInBatch;
+ return DeleteDirContentsWihtoutHierarchicalNamespace(location, true);
+ }
+ }
+
+ Status DeleteDirContents(const AzureLocation& location, bool missing_dir_ok)
{
+ if (location.container.empty()) {
+ return internal::InvalidDeleteDirContents(location.all);
+ }
+ if (location.path.empty()) {
+ return internal::InvalidDeleteDirContents(location.all);
Review Comment:
But what is the root dir? Doesn't that mean *all the containers*?
`s3fs` maps the root-dir concept to *all containers*:
https://github.com/apache/arrow/blob/main/cpp/src/arrow/filesystem/s3fs.cc#L2762
Refusing to delete contents of a container can be very limiting as
`CreateDir(container/)` will create a container.
##########
cpp/src/arrow/filesystem/azurefs.cc:
##########
@@ -970,6 +970,78 @@ class AzureFileSystem::Impl {
return stream;
}
+ private:
+ Status DeleteDirContentsWihtoutHierarchicalNamespace(const AzureLocation&
location,
+ bool missing_dir_ok) {
+ auto container_client =
+ blob_service_client_->GetBlobContainerClient(location.container);
+ Azure::Storage::Blobs::ListBlobsOptions options;
+ options.Prefix = internal::EnsureTrailingSlash(location.path);
+ //
https://learn.microsoft.com/en-us/rest/api/storageservices/blob-batch#remarks
+ //
+ // Only supports up to 256 subrequests in a single batch. The
+ // size of the body for a batch request can't exceed 4 MB.
+ const int32_t kNumMaxRequestsInBatch = 256;
+ options.PageSizeHint = kNumMaxRequestsInBatch;
+ try {
+ auto list_response = container_client.ListBlobs(options);
+ if (!missing_dir_ok && list_response.Blobs.empty()) {
+ return Status::IOError("Specified directory doesn't exist: ",
location.path, ": ",
+ container_client.GetUrl());
+ }
+ while (list_response.HasPage() && !list_response.Blobs.empty()) {
Review Comment:
> ListBlobs() returns HasPage() == true and Blobs.empty() == true response.
Is there a chance of that not being the last page? I see that you now only
`continue;` instead of breaking the entire loop which I think is the more
robust thing to do. 👍
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]