sugibuchi opened a new issue, #43197:
URL: https://github.com/apache/arrow/issues/43197

   ### Describe the enhancement requested
   
   ## Outline
   
   The Azure Blob File System (ABFS) support in Apache Arrow, implemented in 
C++ API by #18014 and integrated into Python API by #39968, currently allows 
embedding a storage account key as a password in an ABFS URL.
   
   
https://github.com/apache/arrow/blob/r-16.1.0/cpp/src/arrow/filesystem/azurefs.h#L138-L144
   
   However, I strongly recommend stopping this practice for two reasons.
   
   ## Security
   
   An access key of a storage account is practically a "root password," giving 
full access to the data in the storage account.
   
   Microsoft repeatedly emphasises this point in various places in the 
documentation and encourages the protection of shared keys in a secure place 
like Azure Key Vault.
   
   > ## Protect your access keys
   > Storage account access keys provide full access to the storage account 
data and the ability to generate SAS tokens. Always be careful to protect your 
access keys. Use Azure Key Vault to manage and rotate your keys securely.
   > 
https://learn.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string#protect-your-access-keys
   
   Embedding a storage account key in an ABFS URL, which is usually not 
considered confidential information, may lead to unexpected exposure of the key.
   
   ## Interoperability with other file system implementations
   
   For historical reasons, the Azure Blob File System (ABFS) URL schemes are 
inconsistent between different file system implementations.
   
   Original implementations by Apache Hadoop's `hadoop-azure` package 
[link](https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-abfs-driver#uri-scheme-to-reference-data)
   * adfs[s]://\<container\>@\<account\>.dsf.core.windows.net/path/to/file
   
   These URL schemes are widely used, particularly by Apache Spark.
   
   Python `adlfs` for `fsspec` 
[link](https://github.com/fsspec/adlfs?tab=readme-ov-file#quickstart) 
   * Hadoop-compatible URL schemes, and
   * az://\<container\>/path/to/file
   * adfs[s]://\<container\>/path/to/file
   
   Rust `object_store::azure` 
[link](https://docs.rs/object_store/latest/src/object_store/azure/builder.rs.html#473-487)
   * Hadoop-compatible URL schemes 
   * adlfs-compatible URL schemes, and
   * azure://\<container\>/path/to/file
   * https://\<account\>.blob.core.windows.net/\<container\>/path/to/file
   * https://\<account\>.dfs.core.windows.net/\<container\>/path/to/file
   
   DuckDB `azure` extension 
[link](https://duckdb.org/docs/extensions/azure#for-azure-data-lake-storage-adls)
   * adfss://\<container\>/path/to/file
   * adfss://\<account\>.dsf.core.windows.net/\<container\>/path/to/file
   
   Apache Arrow 
[link](https://github.com/apache/arrow/blob/r-16.1.0/cpp/src/arrow/filesystem/azurefs.h#L138-L144)
 
   * Hadoop-compatible URL schemes, and
   * 
adfs[s]://\<container\>:\<password\>@\<account\>.dsf.core.windows.net/path/to/file
   * adfs[s]://\<account\>.dsf.core.windows.net/\<container\>/path/to/file
   * 
adfs[s]://\<password\>@\<account\>.dsf.core.windows.net/\<container\>/path/to/file
   
   This consistency of the URL scheme already causes problems in applications 
using different frameworks, including additional overhead to translate ABFS 
URLs between different schemes. It may also lead to unexpected behaviours due 
to misinterpretation of the same URL by different file system implementations.
   
   I believe a new file system implementation should respect the existing URL 
schemes and SHOULD NOT invent new ones. As far as I understand, no other ABFS 
file system implementation allows embedding storage account keys in ABFS URLs. 
   
   ### Component(s)
   
   C++, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to