Kontinuation commented on PR #1992: URL: https://github.com/apache/datafusion-comet/pull/1992#issuecomment-3050889742
I took a look at the `datafusion-comet-objectstore-hdfs` module Comet and found that it largely overlaps with the Hadoop FileSystem bridge we are building here. A better approach is to reuse `datafusion-comet-objectstore-hdfs` but find a way to pass additional Hadoop configurations to it. Users may configuring credentials for accessing the storage in Spark configuration so passing them correctly when constructing the `ObjectStore` is necessary. `datafusion-comet-objectstore-hdfs` works as follows: ``` FFI JNI datafusion-comet-objectstore-hdfs --> fs-hdfs -----> libhdfs -----> Hadoop File System (JVM) ``` `libhdfs` and `fs-hdfs` should be able to support all Hadoop File System implementations, not just HDFS. The current problem is that `fs-hdfs` does not provide a way to instantiate an `HdfsFs` instance using custom Hadoop configurations. `libhdfs` does provide `hdfsBuilderConfSetStr`, so we need to open up new APIs in `fs-hdfs` to make use of it. BTW, is there any concern enabling hdfs support by default and switching the default fs-hdfs dependency to `fs-hdfs3`? https://github.com/apache/datafusion-comet/blob/d885f4a5fdd4a9f249523777e8e590f3eee0e2f7/native/hdfs/Cargo.toml#L34-L37 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org