milenkovicm commented on issue #5638:
URL: https://github.com/apache/arrow-rs/issues/5638#issuecomment-2060661362

   First of all I'd agree with @tustvold and @alamb, arrow maintainers should 
not take this responsibility, HDFS store is a bit more complicated than object 
stores.
   
   IMHO, there are two directions which could be taken: 
   
   1. Implement HDFS support based on C++ `libhdfs`. To my knowledge there are 
two somewhat maintained repositories
   
   - https://github.com/apache/hawq
   - https://github.com/ClickHouse/libhdfs3
   
   and there are few bindings generated for it, one of which is 
https://github.com/datafusion-contrib/hdfs-native
   
   pros: 
   - this approach is not too hard to implement, there is some effort needed 
for rust bindings 
   - it is almost drop in replacement for java wrapper in 
https://github.com/datafusion-contrib/datafusion-objectstore-hdfs, which needs 
some improvements as well 
   - basic operations including kerberos work
   - it IS more performant, less resource hungry and easier to use than java 
version 
   cons:
   - libhdfs needs some effort to get it to parity to latest HDFS interface, 
which might be a bit of effort 
   
   2. Second approach is to write native rust hdfs library and I believe 
@Kimahriman https://github.com/Kimahriman/hdfs-native is on the right track. I 
haven't use the library and cant tell how performant it is but IMHO it looks 
he's on the right track. 
   
   pros: 
   - we'd have up to date hdfs rust library in rust 
   cons: 
   - we need to invest some effort to get there 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to