(gravitino) branch main updated: [#4364] feat(iceberg): Support GCS storage for Iceberg REST server (#4627)

jshao Mon, 09 Sep 2024 09:41:09 -0700

This is an automated email from the ASF dual-hosted git repository.

jshao pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/gravitino.git



The following commit(s) were added to refs/heads/main by this push:
     new 738cb6c86 [#4364] feat(iceberg): Support GCS storage for Iceberg REST 
server (#4627)
738cb6c86 is described below

commit 738cb6c867b6793681bd0459a7834084b620f5bd
Author: FANNG <[email protected]>
AuthorDate: Tue Sep 10 00:41:01 2024 +0800

    [#4364] feat(iceberg): Support GCS storage for Iceberg REST server (#4627)
    
    ### What changes were proposed in this pull request?
    Support GCS storage for Iceberg REST server
    
    ### Why are the changes needed?
    
    Fix: #4364
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    1. start Iceberg REST server with following config:
    ```
    gravitino.iceberg-rest.warehouse = gs://xxx/test
    gravitino.iceberg-rest.io-impl= org.apache.iceberg.gcp.gcs.GCSFileIO
    ```
    2. run spark sqls to create Iceberg table
---
 LICENSE.bin                                   |  1 +
 docs/iceberg-rest-service.md                  | 18 +++++++++++++++++-
 docs/lakehouse-iceberg-catalog.md             | 16 ++++++++++++++++
 docs/spark-connector/spark-catalog-iceberg.md |  6 +++++-
 gradle/libs.versions.toml                     |  1 +
 iceberg/iceberg-common/build.gradle.kts       |  1 +
 6 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/LICENSE.bin b/LICENSE.bin
index 27052f442..14d44b7d3 100644
--- a/LICENSE.bin
+++ b/LICENSE.bin
@@ -301,6 +301,7 @@
    Apache Iceberg AWS
    Apache Iceberg core
    Apache Iceberg Hive metastore
+   Apache Iceberg GCP
    Apache Ivy
    Apache Log4j 1.x Compatibility API
    Apache Log4j API
diff --git a/docs/iceberg-rest-service.md b/docs/iceberg-rest-service.md
index e1c8e6f1e..1d8a20c40 100644
--- a/docs/iceberg-rest-service.md
+++ b/docs/iceberg-rest-service.md
@@ -138,6 +138,22 @@ For other Iceberg OSS properties not managed by Gravitino 
like `client.security-
 Please set the `gravitino.iceberg-rest.warehouse` parameter to 
`oss://{bucket_name}/${prefix_name}`. Additionally, download the [Aliyun OSS 
SDK](https://gosspublic.alicdn.com/sdks/java/aliyun_java_sdk_3.10.2.zip) and 
copy `aliyun-sdk-oss-3.10.2.jar`, `hamcrest-core-1.1.jar`, `jdom2-2.0.6.jar` in 
the classpath of Iceberg REST server, `iceberg-rest-server/libs` for the 
auxiliary server, `libs` for the standalone server.
 :::
 
+#### GCS
+
+Supports using google credential file to access GCS data.
+
+| Configuration item               | Description                               
                                                         | Default value | 
Required | Since Version |
+|----------------------------------|----------------------------------------------------------------------------------------------------|---------------|----------|---------------|
+| `gravitino.iceberg-rest.io-impl` | The io implementation for `FileIO` in 
Iceberg, use `org.apache.iceberg.gcp.gcs.GCSFileIO` for GCS. | (none)        | 
No       | 0.6.0         |
+
+For other Iceberg GCS properties not managed by Gravitino like 
`gcs.project-id`, you could config it directly by 
`gravitino.iceberg-rest.gcs.project-id`.
+
+Please make sure the credential file is accessible by Gravitino, like using 
`export 
GOOGLE_APPLICATION_CREDENTIALS=/xx/application_default_credentials.json` before 
Gravitino Iceberg REST server is started.
+
+:::info
+Please set `gravitino.iceberg-rest.warehouse` to 
`gs://{bucket_name}/${prefix_name}`, and download [Iceberg gcp 
bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-gcp-bundle)
 and place it to the classpath of Gravitino Iceberg REST server, 
`iceberg-rest-server/libs` for the auxiliary server, `libs` for the standalone 
server.
+:::
+
 #### HDFS configuration
 
 You should place HDFS configuration file to the classpath of the Iceberg REST 
server, `iceberg-rest-server/conf` for Gravitino server package, `conf` for 
standalone Gravitino Iceberg REST server package. When writing to HDFS, the 
Gravitino Iceberg REST catalog service can only operate as the specified HDFS 
user and doesn't support proxying to other HDFS users. See [How to access 
Apache Hadoop](gravitino-server-config.md#how-to-access-apache-hadoop) for more 
details.
@@ -321,7 +337,7 @@ For example, we can configure Spark catalog options to use 
Gravitino Iceberg RES
 --conf spark.sql.catalog.rest.uri=http://127.0.0.1:9001/iceberg/
 ```
 
-You may need to adjust the Iceberg Spark runtime jar file name according to 
the real version number in your environment. If you want to access the data 
stored in S3, you need to download [Iceberg AWS 
bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws-bundle)
 jar and place it in the classpath of Spark, no extra config is needed because 
S3 related properties is transferred from Iceberg REST server to Iceberg REST 
client automaticly. 
+You may need to adjust the Iceberg Spark runtime jar file name according to 
the real version number in your environment. If you want to access the data 
stored in cloud, you need to download corresponding jars (please refer to the 
cloud storage part) and place it in the classpath of Spark, no extra config is 
needed because related properties is transferred from Iceberg REST server to 
Iceberg REST client automatically. 
 
 ### Exploring Apache Iceberg with Apache Spark SQL
 
diff --git a/docs/lakehouse-iceberg-catalog.md 
b/docs/lakehouse-iceberg-catalog.md
index 0fc3af157..e73f0ee39 100644
--- a/docs/lakehouse-iceberg-catalog.md
+++ b/docs/lakehouse-iceberg-catalog.md
@@ -103,6 +103,22 @@ For other Iceberg OSS properties not managed by Gravitino 
like `client.security-
 Please set the `warehouse` parameter to `oss://{bucket_name}/${prefix_name}`. 
Additionally, download the [Aliyun OSS 
SDK](https://gosspublic.alicdn.com/sdks/java/aliyun_java_sdk_3.10.2.zip) and 
copy `aliyun-sdk-oss-3.10.2.jar`, `hamcrest-core-1.1.jar`, `jdom2-2.0.6.jar` in 
the `catalogs/lakehouse-iceberg/libs/` directory.
 :::
 
+#### GCS
+
+Supports using google credential file to access GCS data.
+
+| Configuration item     | Description                                         
                                               | Default value | Required | 
Since Version |
+|------------------------|----------------------------------------------------------------------------------------------------|---------------|----------|---------------|
+| `io-impl`              | The io implementation for `FileIO` in Iceberg, use 
`org.apache.iceberg.gcp.gcs.GCSFileIO` for GCS. | (none)        | No       | 
0.6.0         |
+
+For other Iceberg GCS properties not managed by Gravitino like 
`gcs.project-id`, you could config it directly by 
`gravitino.bypass.gcs.project-id`.
+
+Please make sure the credential file is accessible by Gravitino, like using 
`export 
GOOGLE_APPLICATION_CREDENTIALS=/xx/application_default_credentials.json` before 
Gravitino server is started.
+
+:::info
+Please set `warehouse` to `gs://{bucket_name}/${prefix_name}`, and download 
[Iceberg gcp bundle 
jar](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-gcp-bundle) 
and place it to `catalogs/lakehouse-iceberg/libs/`.
+:::
+
 #### Catalog backend security
 
 Users can use the following properties to configure the security of the 
catalog backend if needed. For example, if you are using a Kerberos Hive 
catalog backend, you must set `authentication.type` to `Kerberos` and provide 
`authentication.kerberos.principal` and `authentication.kerberos.keytab-uri`.
diff --git a/docs/spark-connector/spark-catalog-iceberg.md 
b/docs/spark-connector/spark-catalog-iceberg.md
index b13b0ccf9..8552177f8 100644
--- a/docs/spark-connector/spark-catalog-iceberg.md
+++ b/docs/spark-connector/spark-catalog-iceberg.md
@@ -127,4 +127,8 @@ You need to add s3 secret to the Spark configuration using 
`spark.sql.catalog.${
 
 ### OSS
 
-You need to add OSS secret key to the Spark configuration using 
`spark.sql.catalog.${iceberg_catalog_name}.client.access-key-id` and 
`spark.sql.catalog.${iceberg_catalog_name}.client.access-key-secret`. 
Additionally, download the [Aliyun OSS 
SDK](https://gosspublic.alicdn.com/sdks/java/aliyun_java_sdk_3.10.2.zip) and 
copy `aliyun-sdk-oss-3.10.2.jar`, `hamcrest-core-1.1.jar`, `jdom2-2.0.6.jar` in 
the classpath of Spark.
\ No newline at end of file
+You need to add OSS secret key to the Spark configuration using 
`spark.sql.catalog.${iceberg_catalog_name}.client.access-key-id` and 
`spark.sql.catalog.${iceberg_catalog_name}.client.access-key-secret`. 
Additionally, download the [Aliyun OSS 
SDK](https://gosspublic.alicdn.com/sdks/java/aliyun_java_sdk_3.10.2.zip) and 
copy `aliyun-sdk-oss-3.10.2.jar`, `hamcrest-core-1.1.jar`, `jdom2-2.0.6.jar` in 
the classpath of Spark.
+
+### GCS
+
+No extra configuration is needed. Please make sure the credential file is 
accessible by Spark, like using `export 
GOOGLE_APPLICATION_CREDENTIALS=/xx/application_default_credentials.json`, and 
download [Iceberg gcp 
bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-gcp-bundle)
 and place it to the classpath of Spark.
diff --git a/gradle/libs.versions.toml b/gradle/libs.versions.toml
index 6980414c0..4efb10eb2 100644
--- a/gradle/libs.versions.toml
+++ b/gradle/libs.versions.toml
@@ -162,6 +162,7 @@ iceberg-aws = { group = "org.apache.iceberg", name = 
"iceberg-aws", version.ref
 iceberg-core = { group = "org.apache.iceberg", name = "iceberg-core", 
version.ref = "iceberg" }
 iceberg-api = { group = "org.apache.iceberg", name = "iceberg-api", 
version.ref = "iceberg" }
 iceberg-hive-metastore = { group = "org.apache.iceberg", name = 
"iceberg-hive-metastore", version.ref = "iceberg" }
+iceberg-gcp = { group = "org.apache.iceberg", name = "iceberg-gcp", 
version.ref = "iceberg" }
 paimon-core = { group = "org.apache.paimon", name = "paimon-core", version.ref 
= "paimon" }
 paimon-format = { group = "org.apache.paimon", name = "paimon-format", 
version.ref = "paimon" }
 paimon-hive-catalog = { group = "org.apache.paimon", name = 
"paimon-hive-catalog", version.ref = "paimon" }
diff --git a/iceberg/iceberg-common/build.gradle.kts 
b/iceberg/iceberg-common/build.gradle.kts
index fcb0a2b1f..d0964e2da 100644
--- a/iceberg/iceberg-common/build.gradle.kts
+++ b/iceberg/iceberg-common/build.gradle.kts
@@ -37,6 +37,7 @@ dependencies {
   implementation(libs.iceberg.aliyun)
   implementation(libs.iceberg.aws)
   implementation(libs.iceberg.hive.metastore)
+  implementation(libs.iceberg.gcp)
   implementation(libs.hadoop2.common) {
     exclude("com.github.spotbugs")
     exclude("com.sun.jersey")

(gravitino) branch main updated: [#4364] feat(iceberg): Support GCS storage for Iceberg REST server (#4627)

Reply via email to