[
https://issues.apache.org/jira/browse/ARROW-8154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wes McKinney closed ARROW-8154.
-------------------------------
> [Python] HDFS Filesystem does not set environment variables in pyarrow
> 0.16.0 release
> --------------------------------------------------------------------------------------
>
> Key: ARROW-8154
> URL: https://issues.apache.org/jira/browse/ARROW-8154
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.16.0
> Reporter: Eric Henry
> Priority: Major
> Fix For: 0.17.0
>
>
> In pyarrow 0.15.x, HDFS filesystem works as follows:
> If you set HADOOP_HOME env var, it looks for libhdfs.so in
> $HADOOP_HOME/lib/native.
> In pyarrow 0.16.x, if you set HADOOP_HOME, it looks for libhdfs.so in
> $HADOOP_HOME, which is incorrect behaviour on all systems I am using.
> Also, CLASSPATH no longer gets set automatically, which is very convenient.
> The issue here is that I need to set hadoop home correctly to be able to use
> other libraries, but have to reset it to use apache arrow. e.g.
> os.environ["HADOOP_HOME"] = "/usr/lib/hadoop"
> ..do stuff here..
> ...then connect to arrow...
> os.environ["HADOOP_HOME"] = "/usr/lib/hadoop/lib/native"
> hdfs = pyarrow.hdfs.connect(host, port)
> ...then reset my hadoop home...
> os.environ["HADOOP_HOME"] = "/usr/lib/hadoop"
> etc.
>
> Example:
> >>> os.environ["HADOOP_HOME"] = "/usr/lib/hadoop"
> >>> hdfs = pyarrow.hdfs.connect(host, port)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File
> "/home/user/.conda/envs/retroscoring/lib/python3.6/site-packages/pyarrow/hdfs.py",
> line 215, in connect
> extra_conf=extra_conf)
> File
> "/home/user/.conda/envs/retroscoring/lib/python3.6/site-packages/pyarrow/hdfs.py",
> line 40, in __init__
> self._connect(host, port, user, kerb_ticket, driver, extra_conf)
> File "pyarrow/io-hdfs.pxi", line 89, in
> pyarrow.lib.HadoopFileSystem._connect
> File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> OSError: Unable to load libhdfs: /usr/lib/hadoop/libhdfs.so: cannot open
> shared object file: No such file or directory
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)