[jira] [Closed] (ARROW-5922) [Python] Unable to connect to HDFS from a worker/data node on a Kerberized cluster using pyarrow' hdfs API

Saurabh Bajaj (JIRA) Tue, 06 Aug 2019 10:57:13 -0700


     [ 
https://issues.apache.org/jira/browse/ARROW-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Saurabh Bajaj closed ARROW-5922.
--------------------------------
    Resolution: Works for Me

> [Python] Unable to connect to HDFS from a worker/data node on a Kerberized 
> cluster using pyarrow' hdfs API
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-5922
>                 URL: https://issues.apache.org/jira/browse/ARROW-5922
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.14.0
>         Environment: Unix
>            Reporter: Saurabh Bajaj
>            Priority: Major
>             Fix For: 0.14.0
>
>
> Here's what I'm trying:
> {{```}}
> {{import pyarrow as pa }}
> {{conf = \{"hadoop.security.authentication": "kerberos"} }}
> {{fs = pa.hdfs.connect(kerb_ticket="/tmp/krb5cc_44444", extra_conf=conf)}}
> {{```}}
> However, when I submit this job to the cluster using {{Dask-YARN}}, I get the 
> following error:
> ```
> {{File "test/run.py", line 3 fs = 
> pa.hdfs.connect(kerb_ticket="/tmp/krb5cc_44444", extra_conf=conf) File 
> "/opt/hadoop/data/10/hadoop/yarn/local/usercache/hdfsf6/appcache/application_1560931326013_183242/container_e47_1560931326013_183242_01_000003/environment/lib/python3.7/site-packages/pyarrow/hdfs.py",
>  line 211, in connect File 
> "/opt/hadoop/data/10/hadoop/yarn/local/usercache/hdfsf6/appcache/application_1560931326013_183242/container_e47_1560931326013_183242_01_000003/environment/lib/python3.7/site-packages/pyarrow/hdfs.py",
>  line 38, in __init__ File "pyarrow/io-hdfs.pxi", line 105, in 
> pyarrow.lib.HadoopFileSystem._connect File "pyarrow/error.pxi", line 83, in 
> pyarrow.lib.check_status pyarrow.lib.ArrowIOError: HDFS connection failed}}
> {{```}}
> I also tried setting {{host (to a name node)}} and {{port (=8020)}}, however 
> I run into the same error. Since the error is not descriptive, I'm not sure 
> which setting needs to be altered. Any clues anyone?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Closed] (ARROW-5922) [Python] Unable to connect to HDFS from a worker/data node on a Kerberized cluster using pyarrow' hdfs API

Reply via email to