This is an automated email from the ASF dual-hosted git repository.
jshao pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/gravitino.git
The following commit(s) were added to refs/heads/main by this push:
new 55ad5fdee [#4895] docs(iceberg): add document for support not managed
storages for Iceberg (#4896)
55ad5fdee is described below
commit 55ad5fdee7945bef22f9cf5905e159bad0c58d80
Author: FANNG <[email protected]>
AuthorDate: Thu Sep 12 14:55:13 2024 +0800
[#4895] docs(iceberg): add document for support not managed storages for
Iceberg (#4896)
### What changes were proposed in this pull request?
For other storage not build in with Gravitino, we should add a document
about how to run it
### Why are the changes needed?
Fix: #4895
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
existing tests
---
docs/iceberg-rest-service.md | 23 ++++++++++++++++++-----
docs/lakehouse-iceberg-catalog.md | 14 ++++++++++++++
docs/spark-connector/spark-catalog-iceberg.md | 4 ++++
3 files changed, 36 insertions(+), 5 deletions(-)
diff --git a/docs/iceberg-rest-service.md b/docs/iceberg-rest-service.md
index ab16feba9..a5760118c 100644
--- a/docs/iceberg-rest-service.md
+++ b/docs/iceberg-rest-service.md
@@ -18,10 +18,7 @@ The Apache Gravitino Iceberg REST Server follows the [Apache
Iceberg REST API sp
- multi table transaction
- pagination
- Works as a catalog proxy, supporting `Hive` and `JDBC` as catalog backend.
-- Supports multi storage.
- - HDFS
- - S3
- - OSS
+- Supports different storages like `S3`, `HDFS`, `OSS`, `GCS` and provides the
capability to support other storages.
- Supports OAuth2 and HTTPS.
- Provides a pluggable metrics store interface to store and delete Iceberg
metrics.
@@ -162,6 +159,20 @@ You should place HDFS configuration file to the classpath
of the Iceberg REST se
Builds with Hadoop 2.10.x. There may be compatibility issues when accessing
Hadoop 3.x clusters.
:::
+#### Other storages
+
+For other storages that are not managed by Gravitino directly, you can manage
them through custom catalog properties.
+
+| Configuration item | Description
| Default value | Required |
Since Version |
+|----------------------------------|-----------------------------------------------------------------------------------------|---------------|----------|---------------|
+| `gravitino.iceberg-rest.io-impl` | The IO implementation for `FileIO` in
Iceberg, please use the full qualified classname. | (none) | No |
0.6.0 |
+
+To pass custom properties such as `security-token` to your custom `FileIO`,
you can directly configure it by `gravitino.iceberg-rest.security-token`.
`security-token` will be included in the properties when the initialize method
of `FileIO` is invoked.
+
+:::info
+Please set the `gravitino.iceberg-rest.warehouse` parameter to
`{storage_prefix}://{bucket_name}/${prefix_name}`. Additionally, download
corresponding jars in the classpath of Iceberg REST server,
`iceberg-rest-server/libs` for the auxiliary server, `libs` for the standalone
server.
+:::
+
### Catalog backend configuration
:::info
@@ -337,7 +348,9 @@ For example, we can configure Spark catalog options to use
Gravitino Iceberg RES
--conf spark.sql.catalog.rest.uri=http://127.0.0.1:9001/iceberg/
```
-You may need to adjust the Iceberg Spark runtime jar file name according to
the real version number in your environment. If you want to access the data
stored in cloud, you need to download corresponding jars (please refer to the
cloud storage part) and place it in the classpath of Spark, no extra config is
needed because related properties is transferred from Iceberg REST server to
Iceberg REST client automatically.
+You may need to adjust the Iceberg Spark runtime jar file name according to
the real version number in your environment. If you want to access the data
stored in cloud, you need to download corresponding jars (please refer to the
cloud storage part) and place it in the classpath of Spark, no extra config is
needed because related properties is transferred from Iceberg REST server to
Iceberg REST client automatically.
+
+For other storages not managed by Gravitino, the properties wouldn't transfer
from the server to client automatically, if you want to pass custom properties
to initialize `FileIO`, you could add it by
`spark.sql.catalog.${iceberg_catalog_name}.${configuration_key}` =
`{property_value}`.
### Exploring Apache Iceberg with Apache Spark SQL
diff --git a/docs/lakehouse-iceberg-catalog.md
b/docs/lakehouse-iceberg-catalog.md
index e73f0ee39..8470da5b2 100644
--- a/docs/lakehouse-iceberg-catalog.md
+++ b/docs/lakehouse-iceberg-catalog.md
@@ -119,6 +119,20 @@ Please make sure the credential file is accessible by
Gravitino, like using `exp
Please set `warehouse` to `gs://{bucket_name}/${prefix_name}`, and download
[Iceberg gcp bundle
jar](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-gcp-bundle)
and place it to `catalogs/lakehouse-iceberg/libs/`.
:::
+#### Other storages
+
+For other storages that are not managed by Gravitino directly, you can manage
them through custom catalog properties.
+
+| Configuration item | Description
| Default value | Required | Since Version |
+|--------------------|-----------------------------------------------------------------------------------------|---------------|----------|---------------|
+| `io-impl` | The IO implementation for `FileIO` in Iceberg, please
use the full qualified classname. | (none) | No | 0.6.0 |
+
+To pass custom properties such as `security-token` to your custom `FileIO`,
you can directly configure it by `gravitino.bypass.security-token`.
`security-token` will be included in the properties when the initialize method
of `FileIO` is invoked.
+
+:::info
+Please set the `warehouse` parameter to
`{storage_prefix}://{bucket_name}/${prefix_name}`. Additionally, download
corresponding jars in the `catalogs/lakehouse-iceberg/libs/` directory.
+:::
+
#### Catalog backend security
Users can use the following properties to configure the security of the
catalog backend if needed. For example, if you are using a Kerberos Hive
catalog backend, you must set `authentication.type` to `Kerberos` and provide
`authentication.kerberos.principal` and `authentication.kerberos.keytab-uri`.
diff --git a/docs/spark-connector/spark-catalog-iceberg.md
b/docs/spark-connector/spark-catalog-iceberg.md
index 8552177f8..f0b1f2f64 100644
--- a/docs/spark-connector/spark-catalog-iceberg.md
+++ b/docs/spark-connector/spark-catalog-iceberg.md
@@ -132,3 +132,7 @@ You need to add OSS secret key to the Spark configuration
using `spark.sql.catal
### GCS
No extra configuration is needed. Please make sure the credential file is
accessible by Spark, like using `export
GOOGLE_APPLICATION_CREDENTIALS=/xx/application_default_credentials.json`, and
download [Iceberg gcp
bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-gcp-bundle)
and place it to the classpath of Spark.
+
+### Other storage
+
+You may need to add custom configurations with the format
`spark.sql.catalog.${iceberg_catalog_name}.{configuration_key}`. Additionally,
place corresponding jars which implement `FileIO` in the classpath of Spark.