sugibuchi commented on issue #43197:
URL: https://github.com/apache/arrow/issues/43197#issuecomment-2224661586
> If we reject the password value, users can't use account key based
authentication with the URI interface. It'll useful for local development with
Azurite.
I use Apache Arrow mainly in Python code. Let me explain by using PyArrow as
an example.
When working with the PyArrow API, we have two methods to specify a file
system.
1. Create a `FileSystem` instance and explicitly set it as an argument.
2. Let PyArrow infer a file system from a file path URL.
```python
import pyarrow.parquet as pq
# Explicitly set
s3 = fs.S3FileSystem(..)
pq.read_table("my-bucket/data.parquet", filesystem=s3)
# Infer from a URL
pq.read_table("s3://my-bucket/data.parquet")
```
For 1, we don't need to embed a storage account key or any other credentials
for file system access in a file path URL as long as we can set them when we
create a file system instance.
```python
s3 = fs.S3FileSystem(access_key=...)
```
For 2, many existing file system libraries provide an interface to configure
credentials for file system access in a global context.
* Hadoop HDFS: `core-site.xml` etc. in a classpath
* Python `fsspec`: `fsspec.config.conf`
[link](https://filesystem-spec.readthedocs.io/en/latest/features.html#configuration)
* Rust `object_store`: environment variables
* DuckDB: secret manager
[link](https://duckdb.org/docs/configuration/secrets_manager.html)
* Apache Arrow: `S3FileSystem` and `GcsFileSystem` try to obtain
credentials from environment variables or standardized locations in the local
file system
Even if it looks convenient, embedding credentials in a file path URL is
generally unnecessary. Other file system implementations work well without this
method.
In Azure, `EnvironmentCredential` used by `DefaultAzureCredential` defines a
set of environment variables to configure credentials
*
https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity.environmentcredential
There is no standardized environment variable for setting storage account
keys but `AZURE_STORAGE_ACCOUNT` and `AZURE_STORAGE_KEY` are widely used
([example](https://learn.microsoft.com/en-us/azure/storage/blobs/authorize-data-operations-cli#set-environment-variables-for-authorization-parameters)).
We should consider these common practices instead of inventing new URL
syntax.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]