[
https://issues.apache.org/jira/browse/ARROW-9226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396846#comment-17396846
]
Itamar Turner-Trauring commented on ARROW-9226:
-----------------------------------------------
Digging through the code, it doesn't seem like this logic was ever implemented
in Arrow itself; deep down enough, it's logic from `libhdfs`/`libhdfs3`. If I
read this correctly, since the new API still uses those underneath, it's
probably just a matter of (re)exposing the low-level logic in the Arrow wrapper.
> [Python] pyarrow.fs.HadoopFileSystem - retrieve options from core-site.xml or
> hdfs-site.xml if available
> --------------------------------------------------------------------------------------------------------
>
> Key: ARROW-9226
> URL: https://issues.apache.org/jira/browse/ARROW-9226
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Python
> Affects Versions: 0.17.1
> Reporter: Bruno Quinart
> Priority: Minor
> Labels: hdfs
>
> 'Legacy' pyarrow.hdfs.connect was somehow able to get the namenode info from
> the hadoop configuration files.
> The new pyarrow.fs.HadoopFileSystem requires the host to be specified.
> Inferring this info from "the environment" makes it easier to deploy
> pipelines.
> But more important, for HA namenodes it is almost impossible to know for sure
> what to specify. If a rolling restart is ongoing, the namenode is changing.
> There is no guarantee on which will be active in a HA setup.
> I tried connecting to the standby namenode. The connection gets established,
> but when writing a file an error is raised that standby namenodes are not
> allowed to write to.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)