Kimahriman commented on issue #5638: URL: https://github.com/apache/arrow-rs/issues/5638#issuecomment-2061002935
> 2. Second approach is to write native rust hdfs library and I believe @Kimahriman https://github.com/Kimahriman/hdfs-native is on the right track. I haven't use the library and cant tell how performant it is but IMHO it looks he's on the right track. Thanks for the call out! I agree there's no need to have HDFS support directly in this repo since the trait is public and it's a tricky thing to support. I actually have an object_store implementation on top of my library already https://github.com/Kimahriman/hdfs-native/tree/master/crates/hdfs-native-object-store. I've gotten pretty far with it at this point. I have some benchmarks that show reading/writing is at least on-par with the libhdfs based client, and RPC calls are even faster. I suspect performance would be even better in real scenarios, since the JVM client heavily makes use of multi-threading, which would help single-task benchmarks compared to my async setup. The only major feature I'm tracking that is not supported right now is file encryption support via KMS. Not sure how widely that is used or not. The other limitations right now are - It dynamically links to `libgssapi_krb5` native lib (via the `libgssapi` crate), which makes cross compiling tricky/impossible with Kerberos support. I know there are other libs (like compression libraries) that I think use their native implementation, so I'd be curious how those work for cross compiling (compiled and statically linked instead of dynamically linked?). - Reading and writing data isn't quite as resilient to failures as the Java client right now. Reading was a bit of an oversight I'm trying to fix now, writing is more complicated so it's currently just a "retry the whole thing if it fails" setup It's also not super heavily battle tested in various HDFS setups, but I haven't heard much yet of things not working for the few people who might be using it. I've been meaning to try to get it integrated with `delta-rs`, but haven't gotten around to it since ideally I want it included in the Python wheels, but the libgssapi thing has had me stuck for a while. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
