WillDyson commented on issue #26807:
URL: https://github.com/apache/arrow/issues/26807#issuecomment-1573870542

   ABFS URIs take the following form:
   abfs://<container_name>@<account_name>.dfs.core.windows.net
   
   It looks like the sanitisation that's done as part of the from_uri method 
ends up changing it to:
   abfs://<account_name>.dfs.core.windows.net
   
   This can be seen in the error returned – it is missing the container name.
   
   CC: 
[hdfs.cc](https://github.com/apache/arrow/blob/7ca7724139d3b04161369ffce04cf53e74eec54c/cpp/src/arrow/filesystem/hdfs.cc#L367)
 (not familiar with this codebase so I may have picked up the wrong codepath)
   
   A similar exception can be found using the Java client:
   
   ```
   scala> FileSystem.get(new URI("abfs://bogus.dfs.core.windows.net"), new 
Configuration())
   23/06/02 14:50:26 WARN fs.FileSystem: Failed to initialize fileystem 
abfs://bogus.dfs.core.windows.net: abfs://bogus.dfs.core.windows.net has 
invalid authority.
   
org.apache.hadoop.fs.azurebfs.contracts.exceptions.InvalidUriAuthorityException:
 abfs://bogus.dfs.core.windows.net has invalid authority.
     at 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.authorityParts(AzureBlobFileSystemStore.java:334)
     at 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:202)
     at 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:195)
     at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3452)
     at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:162)
     at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3557)
     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3504)
     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:522)
     ... 59 elided
   
   ```
   
   Interestingly, this all appears to happen before a connection to Azure is 
attempted so you may not need an ADLSgen2 container to validate this particular 
issue.
   
   If we include a valid authority, the FileSystem is returned:
   
   ```
   scala> FileSystem.get(new URI("abfs://[email protected]"), new 
Configuration())
   res0: org.apache.hadoop.fs.FileSystem = 
AzureBlobFileSystem{uri=abfs://[email protected], user='wdyson', 
primaryUserGroup='wdyson'[fs.azure.capability.readahead.safe]}
   ```
   
   The wrapper around libhdfs should be modified to retain the container name 
before the @.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to