[ 
https://issues.apache.org/jira/browse/ARROW-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723579#comment-16723579
 ] 

Wes McKinney commented on ARROW-3957:
-------------------------------------

I was able to reproduce the issue by connecting to a live HDFS cluster with the 
wrong port

{code}
In [1]: import pyarrow as pa
co
In [2]: con = pa.hdfs.connect('localhost', port=50070)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/usr/local/hadoop/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/local/hadoop/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-12-17 19:45:45,570 WARN  util.NativeCodeLoader 
(NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for 
your platform... using builtin-java classes where applicable

In [3]: con.ls('/')
hdfsListDirectory(/): FileSystem#listStatus error:
java.io.IOException: Failed on local exception: 
com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group 
tag did not match expected tag.; Host Details : local host is: 
"badgerpad16/127.0.1.1"; destination host is: "localhost":50070; 
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
        at org.apache.hadoop.ipc.Client.call(Client.java:1474)
        at org.apache.hadoop.ipc.Client.call(Client.java:1401)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy9.getListing(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:554)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy10.getListing(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1958)
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1941)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:693)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message 
end-group tag did not match expected tag.
        at 
com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
        at 
com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
        at 
com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:202)
        at 
com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
        at 
com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
        at 
com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
        at 
com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
        at 
org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcHeaderProtos.java:3167)
        at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1074)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:968)
---------------------------------------------------------------------------
ArrowIOError                              Traceback (most recent call last)
<ipython-input-3-cf288d1dd1d6> in <module>()
----> 1 con.ls('/')

~/code/arrow/python/pyarrow/hdfs.py in ls(self, path, detail)
     99         result : list of dicts (detail=True) or strings (detail=False)
    100         """
--> 101         return super(HadoopFileSystem, self).ls(path, detail)
    102 
    103     def walk(self, top_path):

~/code/arrow/python/pyarrow/io-hdfs.pxi in pyarrow.lib.HadoopFileSystem.ls()
    270 
    271         with nogil:
--> 272             check_status(self.client.get()
    273                          .ListDirectory(c_path, &listing))
    274 

~/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
     81             raise ArrowInvalid(message)
     82         elif status.IsIOError():
---> 83             raise ArrowIOError(message)
     84         elif status.IsOutOfMemory():
     85             raise ArrowMemoryError(message)

ArrowIOError: HDFS list directory failed, errno: 255 (Unknown error 255)
{code}

It might be worth suggesting that the port is wrong when the user gets errno 
255. 

> [Python] pyarrow.hdfs.connect fails silently
> --------------------------------------------
>
>                 Key: ARROW-3957
>                 URL: https://issues.apache.org/jira/browse/ARROW-3957
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.11.1
>         Environment: centos 7
>            Reporter: Jim Fulton
>            Priority: Major
>              Labels: hdfs
>
> I'm trying to connect to HDFS using libhdfs and Kerberos.
> I have JAVA_HOME and HADOOP_HOME set and {{pyarrow.hdfs.connect}} sets 
> CLASSPATH correctly.
> My connect call looks like:
> {{import pyarrow.hdfs}}
> {{c = pyarrow.hdfs.connect(host='MYHOST', port=42424,}}
> {{                         user='ME', kerb_ticket="/tmp/krb5cc_498970")}}
> This doesn't error but the resulting connection can't do anything. They 
> either error like this:
> {{ArrowIOError: HDFS list directory failed, errno: 255 (Unknown error 255) }}
> Or swallow errors (e.g. {{exists}} returning {{False}}).
> Note that {{connect}} errors if the host is wrong but doesn't error if the 
> port, user, or kerb_ticket are wrong. I have no idea how to debug this, 
> because no useful errors.
> Note that I _can_ connect using the hdfs Python package. (Of course, that 
> doesn't provide the API I need to read Parquet files.).
> Any help would be appreciated greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to