[ https://issues.apache.org/jira/browse/ARROW-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723579#comment-16723579 ]
Wes McKinney commented on ARROW-3957: ------------------------------------- I was able to reproduce the issue by connecting to a live HDFS cluster with the wrong port {code} In [1]: import pyarrow as pa co In [2]: con = pa.hdfs.connect('localhost', port=50070) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2018-12-17 19:45:45,570 WARN util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable In [3]: con.ls('/') hdfsListDirectory(/): FileSystem#listStatus error: java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.; Host Details : local host is: "badgerpad16/127.0.1.1"; destination host is: "localhost":50070; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at org.apache.hadoop.ipc.Client.call(Client.java:1474) at org.apache.hadoop.ipc.Client.call(Client.java:1401) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy9.getListing(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:554) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy10.getListing(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1958) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1941) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:693) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105) at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755) at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751) Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag. at com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94) at com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124) at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:202) at com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241) at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253) at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259) at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49) at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcHeaderProtos.java:3167) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1074) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:968) --------------------------------------------------------------------------- ArrowIOError Traceback (most recent call last) <ipython-input-3-cf288d1dd1d6> in <module>() ----> 1 con.ls('/') ~/code/arrow/python/pyarrow/hdfs.py in ls(self, path, detail) 99 result : list of dicts (detail=True) or strings (detail=False) 100 """ --> 101 return super(HadoopFileSystem, self).ls(path, detail) 102 103 def walk(self, top_path): ~/code/arrow/python/pyarrow/io-hdfs.pxi in pyarrow.lib.HadoopFileSystem.ls() 270 271 with nogil: --> 272 check_status(self.client.get() 273 .ListDirectory(c_path, &listing)) 274 ~/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status() 81 raise ArrowInvalid(message) 82 elif status.IsIOError(): ---> 83 raise ArrowIOError(message) 84 elif status.IsOutOfMemory(): 85 raise ArrowMemoryError(message) ArrowIOError: HDFS list directory failed, errno: 255 (Unknown error 255) {code} It might be worth suggesting that the port is wrong when the user gets errno 255. > [Python] pyarrow.hdfs.connect fails silently > -------------------------------------------- > > Key: ARROW-3957 > URL: https://issues.apache.org/jira/browse/ARROW-3957 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.11.1 > Environment: centos 7 > Reporter: Jim Fulton > Priority: Major > Labels: hdfs > > I'm trying to connect to HDFS using libhdfs and Kerberos. > I have JAVA_HOME and HADOOP_HOME set and {{pyarrow.hdfs.connect}} sets > CLASSPATH correctly. > My connect call looks like: > {{import pyarrow.hdfs}} > {{c = pyarrow.hdfs.connect(host='MYHOST', port=42424,}} > {{ user='ME', kerb_ticket="/tmp/krb5cc_498970")}} > This doesn't error but the resulting connection can't do anything. They > either error like this: > {{ArrowIOError: HDFS list directory failed, errno: 255 (Unknown error 255) }} > Or swallow errors (e.g. {{exists}} returning {{False}}). > Note that {{connect}} errors if the host is wrong but doesn't error if the > port, user, or kerb_ticket are wrong. I have no idea how to debug this, > because no useful errors. > Note that I _can_ connect using the hdfs Python package. (Of course, that > doesn't provide the API I need to read Parquet files.). > Any help would be appreciated greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)