KirillTsyganov commented on code in PR #46837:
URL: https://github.com/apache/arrow/pull/46837#discussion_r2155860652
##########
python/pyarrow/_azurefs.pyx:
##########
@@ -66,6 +66,15 @@ cdef class AzureFileSystem(FileSystem):
SAS token for the storage account, used as an alternative to
account_key. If sas_token
and account_key are None the default credential will be used. The
parameters
account_key and sas_token are mutually exclusive.
+ tenant_id : str, default None
+ Tenant ID for Azure Active Directory authentication. Must be provided
together with
+ `client_id` and `client_secret` to use ClientSecretCredential.
+ client_id : str, default None
+ Client ID for Azure Active Directory authentication. Must be provided
together with
+ `tenant_id` and `client_secret` to use ClientSecretCredential.
Review Comment:
Hi,
I can definitely do that, this would make pyarrow more future proof, but for
my immediate need, that's not required. Azure has a few different ways to
authentic to it's resources. I'm primarily concerned about AzureML (AML)
service, which includes things like ACR, VMs, keyvault and storage account as
core services. Note that VM's are managed by AzureML service and are not
visible outside of AzureML. Also note that AzureML "buckets" things into
workspace, meaning two different workspace will have different VM's and storage
accounts at least.
Just in case, but I find this [diagram for credential
flow](https://learn.microsoft.com/en-au/azure/developer/python/sdk/authentication/credential-chains?tabs=dac#defaultazurecredential-overview)
useful.
Our internal setup is a little convoluted, but as far as I understand when
working interactively i.e on ComputeInstance (AzureML compute type)
```
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
```
will resolve to managed identity (system assigned for us) of the workspace,
which has access to storage account. But when I'm running the same script
non-iteractively i.e as a AML job `DefaultAzureCredential` will resolve to
managed identity of the VM (ComputeCluster), which for us has no access to
storage account.
We have a separate Service Principle (SP) whose identity we allowed access
to storage account.
SP and Managed Identities are essentially the same thing, except one is
managed by user and another by Microsoft.
If I to use
```
from azure.identity import ManagedIdentityCredential
```
I would need `client_id` of user assigned identity, which we can't make, but
this is just our internal policy, hence why for our use case it's not
particular useful.
as an aside, more than happy to try and stay involved with pyarrow and Azure
related things
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]