martin-traverse commented on issue #47353: URL: https://github.com/apache/arrow/issues/47353#issuecomment-3195965432
Hi - thanks for getting back to me. I already tried credential_kind = cli which didn't work, now using credential_kind = workload_identity gives a different error: Traceback (most recent call last): File "/__w/tracdap/tracdap/tracdap-runtime/python/src/tracdap/rt/_impl/core/storage.py", line 575, in _wrap_operation return func(operation_name, storage_path, *args, **kwargs) File "/__w/tracdap/tracdap/tracdap-runtime/python/src/tracdap/rt/_impl/core/storage.py", line 391, in _mkdir prior_stat: pa_fs.FileInfo = self._fs.get_file_info(resolved_path) ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^ File "pyarrow/_fs.pyx", line 615, in pyarrow._fs.FileSystem.get_file_info File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status pyarrow.lib.ArrowException: Unknown error: Check for Hierarchical Namespace support on 'https://tracdapcistorage1.blob.core.windows.net/tracdap-ci-storage' failed: N5Azure4Core11Credentials23AuthenticationExceptionE: WorkloadIdentityCredential authentication unavailable. Azure Kubernetes environment is not set up correctly. I'm not certain whether Arrow should be doing workload_identity in code, or whether it should be doing cli (because the login already happened in a GitHub Action) and the problem is that it doesn't accept the cli login for a workload service principle. One thing that might be relevant - I had a look at the release notes for Azure identity SDK and it seems there have been quite a few fixes / additions recently: https://github.com/Azure/azure-sdk-for-cpp/blob/main/sdk/identity/azure-identity/CHANGELOG.md I'm not entirely clear on how the azure SDK version in the Arrow build scripts relates to the individual component libraries, but it seems safe to say there are several recent fixes / features that are not being taken yet. Perhaps just putting Azure identity SDK onto latest stable might sort this out? The best result obviously is that we can use the AzureFileSystem() constructor with just account_name and it works automatically through the default mechanism. We can do explicit setup with from_uri() if need be although it's slightly less obvious for a long-lived fs object. You can see our [setup code here](https://github.com/martin-traverse/tracdap/blob/feature/azure-arrow-native/tracdap-runtime/python/src/tracdap/rt/_plugins/storage_azure.py) if you like - it's pretty straightforward. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org