(gravitino) branch main updated: [#4895] docs(iceberg): add document for support not managed storages for Iceberg (#4896)

jshao Wed, 11 Sep 2024 23:56:12 -0700

This is an automated email from the ASF dual-hosted git repository.

jshao pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/gravitino.git



The following commit(s) were added to refs/heads/main by this push:
     new 55ad5fdee [#4895]  docs(iceberg): add document for support not managed 
storages for Iceberg (#4896)
55ad5fdee is described below

commit 55ad5fdee7945bef22f9cf5905e159bad0c58d80
Author: FANNG <[email protected]>
AuthorDate: Thu Sep 12 14:55:13 2024 +0800

    [#4895]  docs(iceberg): add document for support not managed storages for 
Iceberg (#4896)
    
    ### What changes were proposed in this pull request?
    
    For other storage not build in with Gravitino, we should add a document
    about how to run it
    
    ### Why are the changes needed?
    
    Fix: #4895
    
    ### Does this PR introduce _any_ user-facing change?
    no
    
    ### How was this patch tested?
    existing tests
---
 docs/iceberg-rest-service.md                  | 23 ++++++++++++++++++-----
 docs/lakehouse-iceberg-catalog.md             | 14 ++++++++++++++
 docs/spark-connector/spark-catalog-iceberg.md |  4 ++++
 3 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/docs/iceberg-rest-service.md b/docs/iceberg-rest-service.md
index ab16feba9..a5760118c 100644
--- a/docs/iceberg-rest-service.md
+++ b/docs/iceberg-rest-service.md
@@ -18,10 +18,7 @@ The Apache Gravitino Iceberg REST Server follows the [Apache 
Iceberg REST API sp
   - multi table transaction
   - pagination
 - Works as a catalog proxy, supporting `Hive` and `JDBC` as catalog backend.
-- Supports multi storage.
-  - HDFS
-  - S3
-  - OSS
+- Supports different storages like `S3`, `HDFS`, `OSS`, `GCS` and provides the 
capability to support other storages.
 - Supports OAuth2 and HTTPS.
 - Provides a pluggable metrics store interface to store and delete Iceberg 
metrics.
 
@@ -162,6 +159,20 @@ You should place HDFS configuration file to the classpath 
of the Iceberg REST se
 Builds with Hadoop 2.10.x. There may be compatibility issues when accessing 
Hadoop 3.x clusters.
 :::
 
+#### Other storages
+
+For other storages that are not managed by Gravitino directly, you can manage 
them through custom catalog properties.
+
+| Configuration item               | Description                               
                                              | Default value | Required | 
Since Version |
+|----------------------------------|-----------------------------------------------------------------------------------------|---------------|----------|---------------|
+| `gravitino.iceberg-rest.io-impl` | The IO implementation for `FileIO` in 
Iceberg, please use the full qualified classname. | (none)        | No       | 
0.6.0         |
+
+To pass custom properties such as `security-token` to your custom `FileIO`, 
you can directly configure it by `gravitino.iceberg-rest.security-token`. 
`security-token` will be included in the properties when the initialize method 
of `FileIO` is invoked.
+
+:::info
+Please set the `gravitino.iceberg-rest.warehouse` parameter to 
`{storage_prefix}://{bucket_name}/${prefix_name}`. Additionally, download 
corresponding jars in the classpath of Iceberg REST server, 
`iceberg-rest-server/libs` for the auxiliary server, `libs` for the standalone 
server.
+:::
+
 ### Catalog backend configuration
 
 :::info
@@ -337,7 +348,9 @@ For example, we can configure Spark catalog options to use 
Gravitino Iceberg RES
 --conf spark.sql.catalog.rest.uri=http://127.0.0.1:9001/iceberg/
 ```
 
-You may need to adjust the Iceberg Spark runtime jar file name according to 
the real version number in your environment. If you want to access the data 
stored in cloud, you need to download corresponding jars (please refer to the 
cloud storage part) and place it in the classpath of Spark, no extra config is 
needed because related properties is transferred from Iceberg REST server to 
Iceberg REST client automatically. 
+You may need to adjust the Iceberg Spark runtime jar file name according to 
the real version number in your environment. If you want to access the data 
stored in cloud, you need to download corresponding jars (please refer to the 
cloud storage part) and place it in the classpath of Spark, no extra config is 
needed because related properties is transferred from Iceberg REST server to 
Iceberg REST client automatically.
+
+For other storages not managed by Gravitino, the properties wouldn't transfer 
from the server to client automatically, if you want to pass custom properties 
to initialize `FileIO`, you could add it by 
`spark.sql.catalog.${iceberg_catalog_name}.${configuration_key}` = 
`{property_value}`.
 
 ### Exploring Apache Iceberg with Apache Spark SQL
 
diff --git a/docs/lakehouse-iceberg-catalog.md 
b/docs/lakehouse-iceberg-catalog.md
index e73f0ee39..8470da5b2 100644
--- a/docs/lakehouse-iceberg-catalog.md
+++ b/docs/lakehouse-iceberg-catalog.md
@@ -119,6 +119,20 @@ Please make sure the credential file is accessible by 
Gravitino, like using `exp
 Please set `warehouse` to `gs://{bucket_name}/${prefix_name}`, and download 
[Iceberg gcp bundle 
jar](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-gcp-bundle) 
and place it to `catalogs/lakehouse-iceberg/libs/`.
 :::
 
+#### Other storages
+
+For other storages that are not managed by Gravitino directly, you can manage 
them through custom catalog properties.
+
+| Configuration item | Description                                             
                                | Default value | Required | Since Version |
+|--------------------|-----------------------------------------------------------------------------------------|---------------|----------|---------------|
+| `io-impl`          | The IO implementation for `FileIO` in Iceberg, please 
use the full qualified classname. | (none)        | No       | 0.6.0         |
+
+To pass custom properties such as `security-token` to your custom `FileIO`, 
you can directly configure it by `gravitino.bypass.security-token`. 
`security-token` will be included in the properties when the initialize method 
of `FileIO` is invoked.
+
+:::info
+Please set the `warehouse` parameter to 
`{storage_prefix}://{bucket_name}/${prefix_name}`. Additionally, download 
corresponding jars in the `catalogs/lakehouse-iceberg/libs/` directory.
+:::
+
 #### Catalog backend security
 
 Users can use the following properties to configure the security of the 
catalog backend if needed. For example, if you are using a Kerberos Hive 
catalog backend, you must set `authentication.type` to `Kerberos` and provide 
`authentication.kerberos.principal` and `authentication.kerberos.keytab-uri`.
diff --git a/docs/spark-connector/spark-catalog-iceberg.md 
b/docs/spark-connector/spark-catalog-iceberg.md
index 8552177f8..f0b1f2f64 100644
--- a/docs/spark-connector/spark-catalog-iceberg.md
+++ b/docs/spark-connector/spark-catalog-iceberg.md
@@ -132,3 +132,7 @@ You need to add OSS secret key to the Spark configuration 
using `spark.sql.catal
 ### GCS
 
 No extra configuration is needed. Please make sure the credential file is 
accessible by Spark, like using `export 
GOOGLE_APPLICATION_CREDENTIALS=/xx/application_default_credentials.json`, and 
download [Iceberg gcp 
bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-gcp-bundle)
 and place it to the classpath of Spark.
+
+### Other storage
+
+You may need to add custom configurations with the format 
`spark.sql.catalog.${iceberg_catalog_name}.{configuration_key}`. Additionally, 
place corresponding jars which implement `FileIO` in the classpath of Spark.

(gravitino) branch main updated: [#4895] docs(iceberg): add document for support not managed storages for Iceberg (#4896)

Reply via email to