[ https://issues.apache.org/jira/browse/ARROW-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135940#comment-17135940 ]
Thomas Graves commented on ARROW-9019: -------------------------------------- can you give more details on what was missing? I used the exact same setup and it worked with Hadoop 2.9. > [Python] hdfs fails to connect to for HDFS 3.x cluster > ------------------------------------------------------ > > Key: ARROW-9019 > URL: https://issues.apache.org/jira/browse/ARROW-9019 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Thomas Graves > Priority: Major > Labels: filesystem, hdfs > > I'm trying to use the pyarrow hdfs connector with Hadoop 3.1.3 and I get an > error that looks like a protobuf or jar mismatch problem with Hadoop. The > same code works on a Hadoop 2.9 cluster. > > I'm wondering if there is something special I need to do or if pyarrow > doesn't support Hadoop 3.x yet? > Note I tried with pyarrow 0.15.1, 0.16.0, and 0.17.1. > > import pyarrow as pa > hdfs_kwargs = dict(host="namenodehost", > port=9000, > user="tgraves", > driver='libhdfs', > kerb_ticket=None, > extra_conf=None) > fs = pa.hdfs.connect(**hdfs_kwargs) > res = fs.exists("/user/tgraves") > > Error that I get on Hadoop 3.x is: > > dfsExists: invokeMethod((Lorg/apache/hadoop/fs/Path;)Z) error: > ClassCastException: > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto > cannot be cast to > org.apache.hadoop.shaded.com.google.protobuf.Messagejava.lang.ClassCastException: > > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto > cannot be cast to org.apache.hadoop.shaded.com.google.protobuf.Message > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:904) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1661) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1577) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1574) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1589) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1683) -- This message was sent by Atlassian Jira (v8.3.4#803005)