[
https://issues.apache.org/jira/browse/ARROW-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987868#comment-16987868
]
Fabian Höring commented on ARROW-7309:
--------------------------------------
[~apitrou]
I had a look at the new implementations. I get the idea.
So what I need is the new hdfs wrapper and then a Python wrapper
resolve_filesystem_and_path that exposes all those filesystems based on the
path.
The registry idea from
[fsspec|https://github.com/intake/filesystem_spec/blob/master/fsspec/registry.py]
is nice or just if blocks based on the scheme (as it is handled now)
Also some internal caching would be nice (for hdfs filesystems, we have many
different namenodes)
When will hdfs be exposed in Python with the new format ?
I could do a proposal for the fs resolver. But basically it would just be
- move resolve_filesystem_and_path to a new module
- expose new python objects based on the scheme
- create the fs
- add some internal caching
> [Python] Support HDFS federation viewfs:// in resolve_filesystem_and_path
> -------------------------------------------------------------------------
>
> Key: ARROW-7309
> URL: https://issues.apache.org/jira/browse/ARROW-7309
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Affects Versions: 0.15.1
> Reporter: Fabian Höring
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> - Add viewfs support to pyarrow.filesystem.resolve_filesystem_and_path
> - libhdfs already supports injecting the scheme and will automatically
> resolve federation in
> fs = FileSystem#get(URI, conf, ugi)
> - works with Hadoop 2/3
> see:
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c#L770
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c#L637
--
This message was sent by Atlassian Jira
(v8.3.4#803005)