[jira] [Commented] (ARROW-10872) [Python] pyarrow.fs.HadoopFileSystem cannot access Azure Data Lake (ADLS)

Steve Loughran (Jira) Tue, 22 Jun 2021 08:23:06 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-10872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17367451#comment-17367451
 ]


Steve Loughran commented on ARROW-10872:
----------------------------------------

this problem would also surface if file:// was used as the source URL, which 
may permit local replication. (Note, {{MiniDFSCluster}} is something in the 
hadoop-hdfs test JAR to let you bring up an HDFS cluster in process purely for 
testing)

> [Python] pyarrow.fs.HadoopFileSystem cannot access Azure Data Lake (ADLS)
> -------------------------------------------------------------------------
>
>                 Key: ARROW-10872
>                 URL: https://issues.apache.org/jira/browse/ARROW-10872
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 2.0.0
>            Reporter: Juan Galvez
>            Priority: Major
>              Labels: hdfs
>
> It's not possible to open a `{{abfs://}}` or `abfss://` URI with the 
> pyarrow.fs.HadoopFileSystem.
> Using HadoopFileSystem.from_uri(path) does not work and libhdfs will throw an 
> error saying that the authority is invalid (I checked that this is because 
> the string is empty).
> Note that the legacy pyarrow.hdfs.HadoopFileSystem interface works by doing 
> for example:
>  * pyarrow.hdfs.HadoopFileSystem(host="abfs://[email protected]")
>  * pyarrow.hdfs.connect(host="abfs://[email protected]")
> and I believe the new interface should work too by passing the full URI as 
> "host" to `pyarrow.fs.HadoopFileSystem` constructor. However, the constructor 
> wrongly prepends "hdfs://" at the beginning: 
> [https://github.com/apache/arrow/blob/25c736d48dc289f457e74d15d05db65f6d539447/python/pyarrow/_hdfs.pyx#L64]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-10872) [Python] pyarrow.fs.HadoopFileSystem cannot access Azure Data Lake (ADLS)

Reply via email to