yuqi1129 commented on PR #5079:
URL: https://github.com/apache/gravitino/pull/5079#issuecomment-2418988090
@jerryshao
I have verify the code, from the client size, the users should include the
following dependencies if he wants to use gcs fileset
```xml
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs-client</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.gravitino</groupId>
<artifactId>gcp-bundle</artifactId>
<version>0.7.0-incubating-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.apache.gravitino</groupId>
<artifactId>filesystem-hadoop3-runtime</artifactId>
<version>0.7.0-incubating-SNAPSHOT</version>
</dependency>
```
suggested by @xloya, there may be conflicts if we include `hadoop-common`
and `hadoop-hdfs-client` into `filesystem-hadoop3-runtime`, in some query
engines like Spark and Trino, there may already be hdfs-reletad jars in the
context.
The reason why we need to include `hadoop-hdfs-client` is that
`filesystem-hadoop3-runtime` has shaded `hadoop-catalog` and `hadoop-catalog`
contains `DistributedFileSysem` initialization logic even thought I just want
to use GCS.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]