sugibuchi commented on issue #43197:
URL: https://github.com/apache/arrow/issues/43197#issuecomment-2224735341
> Could you share bad scenarios you think?
There are several Azure Blob File System implementations in Python, and we
frequently need to use multiple implementations in the same code. However,
1. Most ABFS implementations, except for Arrow's `AzureFileSystem`, do not
assume that ABFS URLs can contain confidential information like storage account
keys.
2. It is not always clear which implementation is actually used in a
library.
* PyArrow has native `AzureFileSystem` implementation since Arrow
16.0.0. However, `delta-io` implemented on Arrow uses Rust `object_store`.
* Pandas initially used the fsspec's ABFS implementation but silently
started using Arrow's native implementation after the release of Arrow 16.0.0
(#41496).
* DuckDB has native ABFS support. But Rust `object_store` is eventually
used when reading Delta Lake into DuckDB by using
[`ibis`](https://duckdb.org/docs/guides/python/ibis.html) API.
Because of 2, an Arrow's style ABFS URL containing a storage account key can
be accidentally passed to a different ABFS implementation. However, the
different implementation usually does not assume the passed URL contains a
storage account key, as explained in 1.
This leads to the rejection of the URL and an error message like the one
below exposed to error logs, etc.
```
Invalid file path: abfs://my-container:(plain text of a storage account
key)@mycontainer.dfs.core.windows.net/...
```
Having storage account keys in ABFS URLs can cause this kind of
interoperability issue with other ABFS implementations.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]