[ https://issues.apache.org/jira/browse/ARROW-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Saurabh Bajaj closed ARROW-5922. -------------------------------- Resolution: Works for Me > [Python] Unable to connect to HDFS from a worker/data node on a Kerberized > cluster using pyarrow' hdfs API > ---------------------------------------------------------------------------------------------------------- > > Key: ARROW-5922 > URL: https://issues.apache.org/jira/browse/ARROW-5922 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.14.0 > Environment: Unix > Reporter: Saurabh Bajaj > Priority: Major > Fix For: 0.14.0 > > > Here's what I'm trying: > {{```}} > {{import pyarrow as pa }} > {{conf = \{"hadoop.security.authentication": "kerberos"} }} > {{fs = pa.hdfs.connect(kerb_ticket="/tmp/krb5cc_44444", extra_conf=conf)}} > {{```}} > However, when I submit this job to the cluster using {{Dask-YARN}}, I get the > following error: > ``` > {{File "test/run.py", line 3 fs = > pa.hdfs.connect(kerb_ticket="/tmp/krb5cc_44444", extra_conf=conf) File > "/opt/hadoop/data/10/hadoop/yarn/local/usercache/hdfsf6/appcache/application_1560931326013_183242/container_e47_1560931326013_183242_01_000003/environment/lib/python3.7/site-packages/pyarrow/hdfs.py", > line 211, in connect File > "/opt/hadoop/data/10/hadoop/yarn/local/usercache/hdfsf6/appcache/application_1560931326013_183242/container_e47_1560931326013_183242_01_000003/environment/lib/python3.7/site-packages/pyarrow/hdfs.py", > line 38, in __init__ File "pyarrow/io-hdfs.pxi", line 105, in > pyarrow.lib.HadoopFileSystem._connect File "pyarrow/error.pxi", line 83, in > pyarrow.lib.check_status pyarrow.lib.ArrowIOError: HDFS connection failed}} > {{```}} > I also tried setting {{host (to a name node)}} and {{port (=8020)}}, however > I run into the same error. Since the error is not descriptive, I'm not sure > which setting needs to be altered. Any clues anyone? -- This message was sent by Atlassian JIRA (v7.6.14#76016)