[ 
https://issues.apache.org/jira/browse/ARROW-7841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036379#comment-17036379
 ] 

Jack Fan commented on ARROW-7841:
---------------------------------

[~kszucs]
{quote}What are the values for HADOOP_HOME and ARROW_LIBHDFS_DIR environment 
variables?
{quote}
{code:java}
 $ echo $HADOOP_HOME
/opt/hadoop/latest
$ echo $ARROW_LIBHDFS_DIR
{code}
 
{quote}Arrow tries to load libhdfs.so from {{$HADOOP_HOME/libhdfs.so}} and 
{{$ARROW_LIBHDFS_DIR/libhdfs.so}}
{quote}
Why there is a change of behaviour in version 0.16.0?

According to [https://arrow.apache.org/docs/python/filesystems.html], 
"{{ARROW_LIBHDFS_DIR}} (optional): explicit location of {{libhdfs.so}} if it is 
installed somewhere other than {{$HADOOP_HOME/lib/native}}."

IMHO it doesn't seem to make sense to try loading from $HADOOP_HOME/libhdfs.so.

> pyarrow release 0.16.0 breaks `libhdfs.so` loading mechanism
> ------------------------------------------------------------
>
>                 Key: ARROW-7841
>                 URL: https://issues.apache.org/jira/browse/ARROW-7841
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.16.0
>            Reporter: Jack Fan
>            Priority: Major
>             Fix For: 0.15.1
>
>
> I have my env variable setup correctly according to the pyarrow README
> {code:java}
> $ ls $HADOOP_HOME/lib/native
> libhadoop.a  libhadooppipes.a  libhadoop.so  libhadoop.so.1.0.0  
> libhadooputils.a  libhdfs.a  libhdfs.so  libhdfs.so.0.0.0 {code}
> Use the following script to reproduce
> {code:java}
> import pyarrow
> pyarrow.hdfs.connect('hdfs://localhost'){code}
> With pyarrow version 0.15.1 it is fine.
> However, version 0.16.0 will give error
> {code:java}
> Traceback (most recent call last):
>   File "<string>", line 2, in <module>
>   File 
> "/home/jackwindows/anaconda2/lib/python2.7/site-packages/pyarrow/hdfs.py", 
> line 215, in connect
>     extra_conf=extra_conf)
>   File 
> "/home/jackwindows/anaconda2/lib/python2.7/site-packages/pyarrow/hdfs.py", 
> line 40, in __init__
>     self._connect(host, port, user, kerb_ticket, driver, extra_conf)
>   File "pyarrow/io-hdfs.pxi", line 89, in 
> pyarrow.lib.HadoopFileSystem._connect
>   File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> IOError: Unable to load libhdfs: /opt/hadoop/latest/libhdfs.so: cannot open 
> shared object file: No such file or directory {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to