Jim Fulton created ARROW-3957: --------------------------------- Summary: pyarrow.hdfs.connect fails silently Key: ARROW-3957 URL: https://issues.apache.org/jira/browse/ARROW-3957 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.11.1 Environment: centos 7 Reporter: Jim Fulton
I'm trying to connect to HDFS using libhdfs and Kerberos. I have JAVA_HOME and HADOOP_HOME set and {{pyarrow.hdfs.connect}} sets CLASSPATH correctly. My connect call looks like: {{import pyarrow.hdfs c = pyarrow.hdfs.connect(host='MYHOST', port=42424, user='ME', kerb_ticket="/tmp/krb5cc_498970") }} This doesn't error but the resulting connection can't do anything. They either error like this: {{ArrowIOError: HDFS list directory failed, errno: 255 (Unknown error 255) }} Or swallow errors (e.g. {{exists}} returning {{False}}). Note that {{connect}} errors if the host is wrong but doesn't error if the port, user, or kerb_ticket are wrong. I have no idea how to debug this, because no useful errors. Note that I _can_ connect using the hdfs Python package. (Of course, that doesn't provide the API I need to read Parquet files.). Any help would be appreciated greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)