[ 
https://issues.apache.org/jira/browse/ARROW-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16993393#comment-16993393
 ] 

Fabian Höring commented on ARROW-7309:
--------------------------------------

[~apitrou] Can you have a look at the PR and give a feedback on what is 
necessary to merge.

I added unit tests and it passes the existing tests. Tested on our Criteo 
Hadoop cluster.

I also investigated adding new integration test but they currently use an 
external docker image (seems like it is this one 
https://github.com/parrot-stream/docker-impala) for hdfs with a very basic hdfs 
config. Seems difficult to set up viewfs on it with multiple names nodes.

Please also note that this fix is important 
https://github.com/apache/arrow/pull/5957/commits/735375ebd537fa790bcdbce346544120999cb525#diff-58910df77b218837364f0b542f889a1bL49.
 We can pass it as a separate PR if you wish.

This made viewfs work in the old implementation when it is set up as the 
default fs, in this case libhdfs decides based on the config where to find the 
name nodes.

> [Python] Support HDFS federation viewfs://
> ------------------------------------------
>
>                 Key: ARROW-7309
>                 URL: https://issues.apache.org/jira/browse/ARROW-7309
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>    Affects Versions: 0.15.1
>            Reporter: Fabian Höring
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> - Add viewfs support to pyarrow.filesystem.resolve_filesystem_and_path
> -  libhdfs already supports injecting the scheme and will automatically 
> resolve federation in
>     fs = FileSystem#get(URI, conf, ugi)
> -  works with Hadoop 2/3
> see:
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c#L770
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c#L637



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to