sugibuchi commented on issue #43197: URL: https://github.com/apache/arrow/issues/43197#issuecomment-2224735341
> Could you share bad scenarios you think? There are several Azure Blob File System implementations in Python, and we frequently need to use multiple implementations in the same code. However, 1. Most ABFS implementations, except for Arrow's `AzureFileSystem`, do not assume that ABFS URLs can contain confidential information like storage account keys. 2. It is not always clear which implementation is actually used in a library. * PyArrow has native `AzureFileSystem` implementation since Arrow 16.0.0. However, `delta-io` implemented on Arrow uses Rust `object_store`. * Pandas initially used the fsspec's ABFS implementation but silently started using Arrow's native implementation after the release of Arrow 16.0.0 (#41496). * DuckDB has native ABFS support. But Rust `object_store` is eventually used when reading Delta Lake into DuckDB by using [`ibis`](https://duckdb.org/docs/guides/python/ibis.html) API. Because of 2, an Arrow's style ABFS URL containing a storage account key can be accidentally passed to a different ABFS implementation. However, the different implementation usually does not assume the passed URL contains a storage account key, as explained in 1. This leads to the rejection of the URL and an error message like the one below exposed to error logs, etc. ``` Invalid file path: abfs://my-container:(plain text of a storage account key)@mycontainer.dfs.core.windows.net/... ``` Having storage account keys in ABFS URLs can cause this kind of interoperability issue with other ABFS implementations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org