sugibuchi commented on issue #43197:
URL: https://github.com/apache/arrow/issues/43197#issuecomment-2224661586

   > If we reject the password value, users can't use account key based 
authentication with the URI interface. It'll useful for local development with 
Azurite.
   
   I use Apache Arrow mainly in Python code. Let me explain by using PyArrow as 
an example.
   
   When working with the PyArrow API, we have two methods to specify a file 
system.
   
   1. Create a `FileSystem` instance and explicitly set it as an argument.
   2. Let PyArrow infer a file system from a file path URL.
   
   ```python
   import pyarrow.parquet as pq
   
   # Explicitly set
   s3 = fs.S3FileSystem(..)
   pq.read_table("my-bucket/data.parquet", filesystem=s3)
   
   # Infer from a URL
   pq.read_table("s3://my-bucket/data.parquet")
   ```
   
   For 1, we don't need to embed a storage account key or any other credentials 
for file system access in a file path URL as long as we can set them when we 
create a file system instance.
   
   ```python
   s3 = fs.S3FileSystem(access_key=...)
   ```
   
   For 2, many existing file system libraries provide an interface to configure 
credentials for file system access in a global context.
   
   * Hadoop HDFS: `core-site.xml` etc. in a classpath
   * Python `fsspec`:  `fsspec.config.conf` 
[link](https://filesystem-spec.readthedocs.io/en/latest/features.html#configuration)
   * Rust `object_store`: environment variables
   * DuckDB: secret manager 
[link](https://duckdb.org/docs/configuration/secrets_manager.html)
   * Apache Arrow: `S3FileSystem` and `GcsFileSystem`  try to obtain 
credentials from environment variables or standardized locations in the local 
file system
   
   Even if it looks convenient, embedding credentials in a file path URL is 
generally unnecessary. Other file system implementations work well without this 
method.
   
   In Azure, `EnvironmentCredential` used by `DefaultAzureCredential` defines a 
set of environment variables to configure credentials
   
   * 
https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity.environmentcredential
   
   There is no standardized environment variable for setting storage account 
keys but `AZURE_STORAGE_ACCOUNT` and `AZURE_STORAGE_KEY` are widely used 
([example](https://learn.microsoft.com/en-us/azure/storage/blobs/authorize-data-operations-cli#set-environment-variables-for-authorization-parameters)).
   
   We should consider these common practices instead of inventing new URL 
syntax.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to