WillDyson commented on issue #26807: URL: https://github.com/apache/arrow/issues/26807#issuecomment-1573870542
ABFS URIs take the following form: abfs://<container_name>@<account_name>.dfs.core.windows.net It looks like the sanitisation that's done as part of the from_uri method ends up changing it to: abfs://<account_name>.dfs.core.windows.net This can be seen in the error returned – it is missing the container name. CC: [hdfs.cc](https://github.com/apache/arrow/blob/7ca7724139d3b04161369ffce04cf53e74eec54c/cpp/src/arrow/filesystem/hdfs.cc#L367) (not familiar with this codebase so I may have picked up the wrong codepath) A similar exception can be found using the Java client: ``` scala> FileSystem.get(new URI("abfs://bogus.dfs.core.windows.net"), new Configuration()) 23/06/02 14:50:26 WARN fs.FileSystem: Failed to initialize fileystem abfs://bogus.dfs.core.windows.net: abfs://bogus.dfs.core.windows.net has invalid authority. org.apache.hadoop.fs.azurebfs.contracts.exceptions.InvalidUriAuthorityException: abfs://bogus.dfs.core.windows.net has invalid authority. at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.authorityParts(AzureBlobFileSystemStore.java:334) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:202) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:195) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3452) at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:162) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3557) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3504) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:522) ... 59 elided ``` Interestingly, this all appears to happen before a connection to Azure is attempted so you may not need an ADLSgen2 container to validate this particular issue. If we include a valid authority, the FileSystem is returned: ``` scala> FileSystem.get(new URI("abfs://[email protected]"), new Configuration()) res0: org.apache.hadoop.fs.FileSystem = AzureBlobFileSystem{uri=abfs://[email protected], user='wdyson', primaryUserGroup='wdyson'[fs.azure.capability.readahead.safe]} ``` The wrapper around libhdfs should be modified to retain the container name before the @. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
