milenkovicm commented on issue #5638: URL: https://github.com/apache/arrow-rs/issues/5638#issuecomment-2060661362
First of all I'd agree with @tustvold and @alamb, arrow maintainers should not take this responsibility, HDFS store is a bit more complicated than object stores. IMHO, there are two directions which could be taken: 1. Implement HDFS support based on C++ `libhdfs`. To my knowledge there are two somewhat maintained repositories - https://github.com/apache/hawq - https://github.com/ClickHouse/libhdfs3 and there are few bindings generated for it, one of which is https://github.com/datafusion-contrib/hdfs-native pros: - this approach is not too hard to implement, there is some effort needed for rust bindings - it is almost drop in replacement for java wrapper in https://github.com/datafusion-contrib/datafusion-objectstore-hdfs, which needs some improvements as well - basic operations including kerberos work - it IS more performant, less resource hungry and easier to use than java version cons: - libhdfs needs some effort to get it to parity to latest HDFS interface, which might be a bit of effort 2. Second approach is to write native rust hdfs library and I believe @Kimahriman https://github.com/Kimahriman/hdfs-native is on the right track. I haven't use the library and cant tell how performant it is but IMHO it looks he's on the right track. pros: - we'd have up to date hdfs rust library in rust cons: - we need to invest some effort to get there -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
