[ 
https://issues.apache.org/jira/browse/ARROW-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954440#comment-15954440
 ] 

Benjamin Zaitlen commented on ARROW-762:
----------------------------------------

Apologies, was sidetracked by some other work.  

Running on Centos 7.2 and HDP 2.4.3

The following worked for me:

1. export ARROW_LIBHDFS_DIR=/usr/hdp/2.4.3.0-227/usr/lib/
2. export CLASSPATH=$CLASSPATH:`hdfs classpath --glob`

I think the LIBHDFS searching in Arrow is fairly exhaustive but docs pointing 
to places common to CDH/HDP/MapR would probably be helpful. I was only able to 
figure out part 2 with your note on checking what Tensorflow does.  I 
eventually came across this page: https://www.tensorflow.org/deploy/hadoop .  
Something similar or a link on the Arrow docs would also be helpful.  

Also, happy to help add to docs but if it's easy for you please go ahead.  I'll 
leave it to you to close the issue

> Kerberos Problem with PyArrow
> -----------------------------
>
>                 Key: ARROW-762
>                 URL: https://issues.apache.org/jira/browse/ARROW-762
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.2.0
>         Environment: Centos 7.2, HDP 2.4.3
>            Reporter: Benjamin Zaitlen
>
> I'm having trouble using pyarrow with kerberos.  I'm trying to connect to 
> HDFS with the following signature:
> ```
> hdfs = HdfsClient(host='ip-172-31-53-87.ec2.internal', port=8020, 
> kerb_ticket='/tmp/krb5cc_1000', driver='libhdfs3', user='centos')
> ArrowException                            Traceback (most recent call last)
> <ipython-input-2-15087f93c239> in <module>()
> ----> 1 hdfs = HdfsClient(host='ip-172-31-53-87.ec2.internal', port=8020, 
> kerb_ticket='/tmp/krb5cc_1000', driver='libhdfs3', user='centos')
> /home/centos/miniconda3/envs/hdfs_test/lib/python3.5/site-packages/pyarrow/filesystem.py
>  in __init__(self, host, port, user, kerb_ticket, driver)
>     168     def __init__(self, host="default", port=0, user=None, 
> kerb_ticket=None,
>     169                  driver='libhdfs'):
> --> 170         self._connect(host, port, user, kerb_ticket, driver)
>     171
>     172     @implements(Filesystem.isdir)
> /home/centos/miniconda3/envs/hdfs_test/lib/python3.5/site-packages/pyarrow/io.pyx
>  in pyarrow.io._HdfsClient._connect 
> (/feedstock_root/build_artefacts/pyarrow_1488727736041/work/arrow-f6924ad83bc95741f003830892ad4815ca3b70fd/python/build/temp.linux-x86_64-3.5/io.cxx:11090)()
> /home/centos/miniconda3/envs/hdfs_test/lib/python3.5/site-packages/pyarrow/error.pyx
>  in pyarrow.error.check_status 
> (/feedstock_root/build_artefacts/pyarrow_1488727736041/work/arrow-f6924ad83bc95741f003830892ad4815ca3b70fd/python/build/temp.linux-x86_64-3.5/error.cxx:1197)()
> ArrowException: IOError: HDFS connection failed
> ```
> Below shows a valid ticket:
> ```
> [centos@ip-172-31-61-224 usr]$ klist
> Ticket cache: FILE:/tmp/krb5cc_1000
> Default principal: centos@DOMAIN
> Valid starting       Expires              Service principal
> 04/03/2017 14:36:38  04/04/2017 14:36:38  krbtgt/DOMAIN@DOMAIN
>         renew until 04/10/2017 14:36:38
> ```



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to