KirillTsyganov commented on code in PR #46837: URL: https://github.com/apache/arrow/pull/46837#discussion_r2155860652
########## python/pyarrow/_azurefs.pyx: ########## @@ -66,6 +66,15 @@ cdef class AzureFileSystem(FileSystem): SAS token for the storage account, used as an alternative to account_key. If sas_token and account_key are None the default credential will be used. The parameters account_key and sas_token are mutually exclusive. + tenant_id : str, default None + Tenant ID for Azure Active Directory authentication. Must be provided together with + `client_id` and `client_secret` to use ClientSecretCredential. + client_id : str, default None + Client ID for Azure Active Directory authentication. Must be provided together with + `tenant_id` and `client_secret` to use ClientSecretCredential. Review Comment: Hi, I can definitely do that, this would make pyarrow more future proof, but for my immediate need, that's not required. Azure has a few different ways to authentic to it's resources. I'm primarily concerned about AzureML (AML) service, which includes things like ACR, VMs, keyvault and storage account as core services. Note that VM's are managed by AzureML service and are not visible outside of AzureML. Also note that AzureML "buckets" things into workspace, meaning two different workspace will have different VM's and storage accounts at least. Just in case, but I find this [diagram for credential flow](https://learn.microsoft.com/en-au/azure/developer/python/sdk/authentication/credential-chains?tabs=dac#defaultazurecredential-overview) useful. Our internal setup is a little convoluted, but as far as I understand when working interactively i.e on ComputeInstance (AzureML compute type) ``` from azure.identity import DefaultAzureCredential credential = DefaultAzureCredential() ``` will resolve to managed identity (system assigned for us) of the workspace, which has access to storage account. But when I'm running the same script non-iteractively i.e as a AML job `DefaultAzureCredential` will resolve to managed identity of the VM (ComputeCluster), which for us has no access to storage account. We have a separate Service Principle (SP) whose identity we allowed access to storage account. SP and Managed Identities are essentially the same thing, except one is managed by user and another by Microsoft. If I to use ``` from azure.identity import ManagedIdentityCredential ``` I would need `client_id` of user assigned identity, which we can't make, but this is just our internal policy, hence why for our use case it's not particular useful. as an aside, more than happy to try and stay involved with pyarrow and Azure related things -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org