[ https://issues.apache.org/jira/browse/ARROW-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Fulton updated ARROW-3957: ------------------------------ Description: I'm trying to connect to HDFS using libhdfs and Kerberos. I have JAVA_HOME and HADOOP_HOME set and {{pyarrow.hdfs.connect}} sets CLASSPATH correctly. My connect call looks like: {{import pyarrow.hdfs}} {{c = pyarrow.hdfs.connect(host='MYHOST', port=42424,}} {{ user='ME', kerb_ticket="/tmp/krb5cc_498970")}} This doesn't error but the resulting connection can't do anything. They either error like this: {{ArrowIOError: HDFS list directory failed, errno: 255 (Unknown error 255) }} Or swallow errors (e.g. {{exists}} returning {{False}}). Note that {{connect}} errors if the host is wrong but doesn't error if the port, user, or kerb_ticket are wrong. I have no idea how to debug this, because no useful errors. Note that I _can_ connect using the hdfs Python package. (Of course, that doesn't provide the API I need to read Parquet files.). Any help would be appreciated greatly. was: I'm trying to connect to HDFS using libhdfs and Kerberos. I have JAVA_HOME and HADOOP_HOME set and {{pyarrow.hdfs.connect}} sets CLASSPATH correctly. My connect call looks like: {{import pyarrow.hdfs}} {{c = pyarrow.hdfs.connect(host='MYHOST', port=42424,}} {{ user='ME', kerb_ticket="/tmp/krb5cc_498970")}} This doesn't error but the resulting connection can't do anything. They either error like this: {{ArrowIOError: HDFS list directory failed, errno: 255 (Unknown error 255) }} Or swallow errors (e.g. {{exists}} returning {{False}}). Note that {{connect}} errors if the host is wrong but doesn't error if the port, user, or kerb_ticket are wrong. I have no idea how to debug this, because no useful errors. Note that I _can_ connect using the hdfs Python package. (Of course, that doesn't provide the API I need to read Parquet files.). Any help would be appreciated greatly. > [Python] pyarrow.hdfs.connect fails silently > -------------------------------------------- > > Key: ARROW-3957 > URL: https://issues.apache.org/jira/browse/ARROW-3957 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.11.1 > Environment: centos 7 > Reporter: Jim Fulton > Priority: Major > Labels: hdfs > > I'm trying to connect to HDFS using libhdfs and Kerberos. > I have JAVA_HOME and HADOOP_HOME set and {{pyarrow.hdfs.connect}} sets > CLASSPATH correctly. > My connect call looks like: > {{import pyarrow.hdfs}} > {{c = pyarrow.hdfs.connect(host='MYHOST', port=42424,}} > {{ user='ME', kerb_ticket="/tmp/krb5cc_498970")}} > This doesn't error but the resulting connection can't do anything. They > either error like this: > {{ArrowIOError: HDFS list directory failed, errno: 255 (Unknown error 255) }} > Or swallow errors (e.g. {{exists}} returning {{False}}). > Note that {{connect}} errors if the host is wrong but doesn't error if the > port, user, or kerb_ticket are wrong. I have no idea how to debug this, > because no useful errors. > Note that I _can_ connect using the hdfs Python package. (Of course, that > doesn't provide the API I need to read Parquet files.). > Any help would be appreciated greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)