Kontinuation commented on PR #1992:
URL:
https://github.com/apache/datafusion-comet/pull/1992#issuecomment-3050889742
I took a look at the `datafusion-comet-objectstore-hdfs` module Comet and
found that it largely overlaps with the Hadoop FileSystem bridge we are
building here. A better approach is to reuse
`datafusion-comet-objectstore-hdfs` but find a way to pass additional Hadoop
configurations to it. Users may configuring credentials for accessing the
storage in Spark configuration so passing them correctly when constructing the
`ObjectStore` is necessary.
`datafusion-comet-objectstore-hdfs` works as follows:
```
FFI JNI
datafusion-comet-objectstore-hdfs --> fs-hdfs -----> libhdfs -----> Hadoop
File System (JVM)
```
`libhdfs` and `fs-hdfs` should be able to support all Hadoop File System
implementations, not just HDFS. The current problem is that `fs-hdfs` does not
provide a way to instantiate an `HdfsFs` instance using custom Hadoop
configurations. `libhdfs` does provide `hdfsBuilderConfSetStr`, so we need to
open up new APIs in `fs-hdfs` to make use of it.
BTW, is there any concern enabling hdfs support by default and switching the
default fs-hdfs dependency to `fs-hdfs3`?
https://github.com/apache/datafusion-comet/blob/d885f4a5fdd4a9f249523777e8e590f3eee0e2f7/native/hdfs/Cargo.toml#L34-L37
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]