Kontinuation commented on PR #1992:
URL: 
https://github.com/apache/datafusion-comet/pull/1992#issuecomment-3050889742

   I took a look at the `datafusion-comet-objectstore-hdfs` module Comet and 
found that it largely overlaps with the Hadoop FileSystem bridge we are 
building here. A better approach is to reuse 
`datafusion-comet-objectstore-hdfs` but find a way to pass additional Hadoop 
configurations to it. Users may configuring credentials for accessing the 
storage in Spark configuration so passing them correctly when constructing the 
`ObjectStore` is necessary.
   
   `datafusion-comet-objectstore-hdfs` works as follows:
   
   ```
                                                  FFI            JNI
   datafusion-comet-objectstore-hdfs --> fs-hdfs -----> libhdfs -----> Hadoop 
File System (JVM)
   ```
   
   `libhdfs` and `fs-hdfs` should be able to support all Hadoop File System 
implementations, not just HDFS. The current problem is that `fs-hdfs` does not 
provide a way to instantiate an `HdfsFs` instance using custom Hadoop 
configurations. `libhdfs` does provide `hdfsBuilderConfSetStr`, so we need to 
open up new APIs in `fs-hdfs` to make use of it.
   
   BTW, is there any concern enabling hdfs support by default and switching the 
default fs-hdfs dependency to `fs-hdfs3`? 
https://github.com/apache/datafusion-comet/blob/d885f4a5fdd4a9f249523777e8e590f3eee0e2f7/native/hdfs/Cargo.toml#L34-L37


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to