Juan Galvez created ARROW-10872:
-----------------------------------
Summary: pyarrow.fs.HadoopFileSystem cannot access Azure Data Lake
(ADLS)
Key: ARROW-10872
URL: https://issues.apache.org/jira/browse/ARROW-10872
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 2.0.0
Reporter: Juan Galvez
It's not possible to open a `abfs://` or `abfss://` URI with the
pyarrow.fs.HadoopFileSystem.
Using HadoopFileSystem.from_uri(path) does not work and libhdfs will throw an
error saying that the authority is invalid (I checked that this is because the
string is empty).
Note that the legacy pyarrow.hdfs.HadoopFileSystem interface works by doing for
example:
* pyarrow.hdfs.HadoopFileSystem(host="abfs://[email protected]")
* pyarrow.hdfs.connect(host="abfs://[email protected]")
and I believe the new interface should work too by passing the full URI as
"host" to `pyarrow.fs.HadoopFileSystem` constructor. However, the constructor
wrongly prepends "hdfs://" at the beginning:
https://github.com/apache/arrow/blob/25c736d48dc289f457e74d15d05db65f6d539447/python/pyarrow/_hdfs.pyx#L64
--
This message was sent by Atlassian Jira
(v8.3.4#803005)