KirillTsyganov commented on code in PR #46837:
URL: https://github.com/apache/arrow/pull/46837#discussion_r2155860652


##########
python/pyarrow/_azurefs.pyx:
##########
@@ -66,6 +66,15 @@ cdef class AzureFileSystem(FileSystem):
         SAS token for the storage account, used as an alternative to 
account_key. If sas_token
         and account_key are None the default credential will be used. The 
parameters 
         account_key and sas_token are mutually exclusive.
+    tenant_id : str, default None
+        Tenant ID for Azure Active Directory authentication. Must be provided 
together with
+        `client_id` and `client_secret` to use ClientSecretCredential.
+    client_id : str, default None
+        Client ID for Azure Active Directory authentication. Must be provided 
together with
+        `tenant_id` and `client_secret` to use ClientSecretCredential.

Review Comment:
   Hi, 
   
   I can definitely do that, this would make pyarrow more future proof, but for 
my immediate need, that's not required. Azure has a few different ways to 
authentic to it's resources. I'm primarily concerned about AzureML (AML) 
service, which includes things like ACR, VMs, keyvault and storage account as 
core services. Note that VM's are managed by AzureML service and are not 
visible outside of AzureML. Also note that AzureML "buckets" things into 
workspace, meaning two different workspace will have different VM's and storage 
accounts at least.
   
   Just in case, but I find this [diagram for credential 
flow](https://learn.microsoft.com/en-au/azure/developer/python/sdk/authentication/credential-chains?tabs=dac#defaultazurecredential-overview)
 useful.
   
   Our internal setup is a little convoluted, but as far as I understand when 
working interactively i.e on ComputeInstance (AzureML compute type)
   
   ```
   from azure.identity import DefaultAzureCredential
   
   credential = DefaultAzureCredential()
   ```
   
   will resolve to managed identity (system assigned for us) of the workspace, 
which has access to storage account. But when I'm running the same script 
non-iteractively i.e as a AML job `DefaultAzureCredential` will resolve to 
managed identity of the VM (ComputeCluster), which for us has no access to 
storage account. 
   
   We have a separate Service Principle (SP) whose identity we allowed access 
to storage account. 
   
   SP and Managed Identities are essentially the same thing, except one is 
managed by user and another by Microsoft. 
   
   If I to use
   
   ```
   from azure.identity import ManagedIdentityCredential
   ```
   
   I would need `client_id` of user assigned identity, which we can't make, but 
this is just our internal policy, hence why for our use case it's not 
particular useful.
   
   as an aside, more than happy to try and stay involved with pyarrow and Azure 
related things



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to