felipecrv commented on code in PR #41068:
URL: https://github.com/apache/arrow/pull/41068#discussion_r1557814378


##########
cpp/src/arrow/filesystem/azurefs.cc:
##########
@@ -1642,11 +1642,27 @@ class AzureFileSystem::Impl {
       options.Prefix = {};
       found = true;  // Unless the container itself is not found later!
     } else {
-      options.Prefix = internal::EnsureTrailingSlash(base_location.path);
+      ARROW_ASSIGN_OR_RAISE(
+          auto prefix, AzureLocation::FromString(
+                           
std::string(internal::EnsureTrailingSlash(select.base_dir))));
+      ARROW_ASSIGN_OR_RAISE(auto info, GetFileInfo(container_client, prefix));
+      if (info.type() == FileType::NotFound) {
+        if (select.allow_not_found) {
+          return Status::OK();
+        } else {
+          return PathNotFound(base_location);
+        }
+      } else if (info.type() != FileType::Directory) {
+        return NotADir(base_location);
+      }
+      options.Prefix = prefix.path;
     }
     options.PageSizeHint = page_size_hint;
     options.Include = Blobs::Models::ListBlobsIncludeFlags::Metadata;
 
+    auto adlfs_client = GetFileSystemClient(base_location.container);
+    ARROW_ASSIGN_OR_RAISE(auto hns_support, 
HierarchicalNamespaceSupport(adlfs_client));

Review Comment:
   This fix is very confusing to me. This function is 
`GetFileInfoWithSelectorFromContainer`, it's meant to be called when HNS 
support has been detected as disabled.
   
   The right fix (IMO) is to create 
`GetFileInfoWithSelectorFromFileSystem(adlfs_client, ...)` and then in 
`GetFileInfoWithSelector` dispatch to the different implementations according 
to the HNS support detection.



##########
cpp/src/arrow/filesystem/azurefs.cc:
##########
@@ -2168,13 +2203,6 @@ class AzureFileSystem::Impl {
       // All the others either succeed or throw an exception.
       DCHECK(response.Value.Deleted);
     } catch (const Storage::StorageException& exception) {
-      if (exception.ErrorCode == "FilesystemNotFound" ||
-          exception.ErrorCode == "PathNotFound") {
-        if (require_dir_to_exist) {
-          return PathNotFound(location);
-        }
-        return Status::OK();
-      }

Review Comment:
   Given the TZ differences, I will give myself a chance of fixing these bugs 
to see if I can come up with something that doesn't introduce pessimistic round 
trips to the backend.



##########
cpp/src/arrow/filesystem/azurefs.cc:
##########
@@ -2168,13 +2203,6 @@ class AzureFileSystem::Impl {
       // All the others either succeed or throw an exception.
       DCHECK(response.Value.Deleted);
     } catch (const Storage::StorageException& exception) {
-      if (exception.ErrorCode == "FilesystemNotFound" ||
-          exception.ErrorCode == "PathNotFound") {
-        if (require_dir_to_exist) {
-          return PathNotFound(location);
-        }
-        return Status::OK();
-      }

Review Comment:
   The fact that you check existence above doesn't mean these errors can't 
happen down here.
   
   To use terms that @pitrou once shared with me [1], you're moving from EAFP 
[2] "easier to ask for permission" to LBYL [3] "look before you leap".
   
   From [3]:
   > ### LBYL
   > Look before you leap. This coding style explicitly tests for 
pre-conditions before making calls or lookups. This style contrasts with the 
[EAFP](https://docs.python.org/3.5/glossary.html#term-eafp) approach and is 
characterized by the presence of many 
[if](https://docs.python.org/3.5/reference/compound_stmts.html#if) statements.
   >
   > In a multi-threaded environment, **the LBYL approach can risk introducing 
a race condition** between “the looking” and “the leaping”. For example, the 
code, if key in mapping: return mapping[key] can fail if another thread removes 
key from mapping after the test, but before the lookup. This issue can be 
solved with locks or by using the EAFP approach.
   
   
   [1] https://devblogs.microsoft.com/python/idiomatic-python-eafp-versus-lbyl/
   [2] https://docs.python.org/3.5/glossary.html#term-eafp
   [3] https://docs.python.org/3.5/glossary.html#term-lbyl



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to