[ https://issues.apache.org/jira/browse/ARROW-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16715591#comment-16715591 ]
Jim Fulton edited comment on ARROW-3957 at 12/10/18 9:39 PM: ------------------------------------------------------------- A contributing factor was that I was using a Jupyter notebook, which hid some output. When I ran outside of a notebook, I could see a Java traceback featuring: {{java.io.IOException: Failed on local exception: org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length}} I also tried the hdfs command-line tool and saw the same error, so I knew I was screwing up consistently. ;) Eventually, I realized I was using the wrong protocol. was (Author: j1m): A contributing factor was that I was using a Jupyter notebook, which hid some output. When I ran outside of a notebook, I could see a Java traceback featuring: {{java.io.IOException: Failed on local exception: org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length}} I also tried the hdfs command-line tool and saw the same error, so I know I was screwing up consistently. ;) Eventually, I realized I was using the wrong protocol. > [Python] pyarrow.hdfs.connect fails silently > -------------------------------------------- > > Key: ARROW-3957 > URL: https://issues.apache.org/jira/browse/ARROW-3957 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.11.1 > Environment: centos 7 > Reporter: Jim Fulton > Priority: Major > Labels: hdfs > > I'm trying to connect to HDFS using libhdfs and Kerberos. > I have JAVA_HOME and HADOOP_HOME set and {{pyarrow.hdfs.connect}} sets > CLASSPATH correctly. > My connect call looks like: > {{import pyarrow.hdfs}} > {{c = pyarrow.hdfs.connect(host='MYHOST', port=42424,}} > {{ user='ME', kerb_ticket="/tmp/krb5cc_498970")}} > This doesn't error but the resulting connection can't do anything. They > either error like this: > {{ArrowIOError: HDFS list directory failed, errno: 255 (Unknown error 255) }} > Or swallow errors (e.g. {{exists}} returning {{False}}). > Note that {{connect}} errors if the host is wrong but doesn't error if the > port, user, or kerb_ticket are wrong. I have no idea how to debug this, > because no useful errors. > Note that I _can_ connect using the hdfs Python package. (Of course, that > doesn't provide the API I need to read Parquet files.). > Any help would be appreciated greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)