Juan Galvez created ARROW-10872:
-----------------------------------

             Summary: pyarrow.fs.HadoopFileSystem cannot access Azure Data Lake 
(ADLS)
                 Key: ARROW-10872
                 URL: https://issues.apache.org/jira/browse/ARROW-10872
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 2.0.0
            Reporter: Juan Galvez


It's not possible to open a `abfs://` or `abfss://` URI with the 
pyarrow.fs.HadoopFileSystem.

Using HadoopFileSystem.from_uri(path) does not work and libhdfs will throw an 
error saying that the authority is invalid (I checked that this is because the 
string is empty).

Note that the legacy pyarrow.hdfs.HadoopFileSystem interface works by doing for 
example:
 * pyarrow.hdfs.HadoopFileSystem(host="abfs://[email protected]")
 * pyarrow.hdfs.connect(host="abfs://[email protected]")

and I believe the new interface should work too by passing the full URI as 
"host" to `pyarrow.fs.HadoopFileSystem` constructor. However, the constructor 
wrongly prepends "hdfs://" at the beginning: 
https://github.com/apache/arrow/blob/25c736d48dc289f457e74d15d05db65f6d539447/python/pyarrow/_hdfs.pyx#L64



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to