This is an automated email from the ASF dual-hosted git repository.
jshao pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/gravitino.git
The following commit(s) were added to refs/heads/main by this push:
new a14cc5600 [#4630] feat(iceberg): remove S3 SDK jar from Gravitino
Iceberg REST server (#4631)
a14cc5600 is described below
commit a14cc56002bd8692453003d928c1211ba1dddbec
Author: FANNG <[email protected]>
AuthorDate: Fri Aug 23 10:48:53 2024 +0800
[#4630] feat(iceberg): remove S3 SDK jar from Gravitino Iceberg REST server
(#4631)
### What changes were proposed in this pull request?
remove S3 SDK jar from Gravitino Iceberg REST server
### Why are the changes needed?
Fix: #4630
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
tested on Iceberg REST server and Gravitino Iceberg catalog
---
docs/iceberg-rest-service.md | 4 +--
docs/lakehouse-iceberg-catalog.md | 2 +-
docs/spark-connector/spark-catalog-iceberg.md | 49 ++++++++++-----------------
gradle/libs.versions.toml | 3 --
iceberg/iceberg-common/build.gradle.kts | 2 --
5 files changed, 20 insertions(+), 40 deletions(-)
diff --git a/docs/iceberg-rest-service.md b/docs/iceberg-rest-service.md
index 5389f934f..8341e5829 100644
--- a/docs/iceberg-rest-service.md
+++ b/docs/iceberg-rest-service.md
@@ -117,7 +117,7 @@ Gravitino Iceberg REST service supports using static
access-key-id and secret-ac
For other Iceberg s3 properties not managed by Gravitino like `s3.sse.type`,
you could config it directly by `gravitino.iceberg-rest.s3.sse.type`.
:::info
-Please set `gravitino.iceberg-rest.warehouse` to
`s3://{bucket_name}/${prefix_name}` for Jdbc catalog backend,
`s3a://{bucket_name}/${prefix_name}` for Hive catalog backend.
+To configure the JDBC catalog backend, set the
`gravitino.iceberg-rest.warehouse` parameter to
`s3://{bucket_name}/${prefix_name}`. For the Hive catalog backend, set
`gravitino.iceberg-rest.warehouse` to `s3a://{bucket_name}/${prefix_name}`.
Additionally, download the [Iceberg AWS
bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws-bundle)
and place it in the classpath of Iceberg REST server.
:::
#### HDFS configuration
@@ -284,7 +284,7 @@ For example, we can configure Spark catalog options to use
Gravitino Iceberg RES
--conf spark.sql.catalog.rest.uri=http://127.0.0.1:9001/iceberg/
```
-You may need to adjust the Iceberg Spark runtime jar file name according to
the real version number in your environment. If you want to access the data
stored in S3, you need to download
[iceberg-aws-bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws-bundle)
jar and place it in the classpath of Spark, no extra config is needed because
S3 related properties is transferred from Iceberg REST server to Iceberg REST
client automaticly.
+You may need to adjust the Iceberg Spark runtime jar file name according to
the real version number in your environment. If you want to access the data
stored in S3, you need to download [Iceberg AWS
bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws-bundle)
jar and place it in the classpath of Spark, no extra config is needed because
S3 related properties is transferred from Iceberg REST server to Iceberg REST
client automaticly.
### Exploring Apache Iceberg with Apache Spark SQL
diff --git a/docs/lakehouse-iceberg-catalog.md
b/docs/lakehouse-iceberg-catalog.md
index f20487fa1..96bbb1986 100644
--- a/docs/lakehouse-iceberg-catalog.md
+++ b/docs/lakehouse-iceberg-catalog.md
@@ -80,7 +80,7 @@ Supports using static access-key-id and secret-access-key to
access S3 data.
For other Iceberg s3 properties not managed by Gravitino like `s3.sse.type`,
you could config it directly by `gravitino.bypass.s3.sse.type`.
:::info
-Please set `gravitino.iceberg-rest.warehouse` to
`s3://{bucket_name}/${prefix_name}` for JDBC catalog backend,
`s3a://{bucket_name}/${prefix_name}` for Hive catalog backend.
+To configure the JDBC catalog backend, set the `warehouse` parameter to
`s3://{bucket_name}/${prefix_name}`. For the Hive catalog backend, set
`warehouse` to `s3a://{bucket_name}/${prefix_name}`. Additionally, download the
[Iceberg AWS bundle]([Iceberg AWS
bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws-bundle))
and place it in the `catalogs/lakehouse-iceberg/libs/` directory.
:::
#### Catalog backend security
diff --git a/docs/spark-connector/spark-catalog-iceberg.md
b/docs/spark-connector/spark-catalog-iceberg.md
index 3bc616631..1e1855d2c 100644
--- a/docs/spark-connector/spark-catalog-iceberg.md
+++ b/docs/spark-connector/spark-catalog-iceberg.md
@@ -97,44 +97,29 @@ DESC EXTENDED employee;
For more details about `CALL`, please refer to the [Spark Procedures
description](https://iceberg.apache.org/docs/1.5.2/spark-procedures/#spark-procedures)
in Iceberg official document.
-## Apache Iceberg catalog backend support
-- HiveCatalog
-- JdbcCatalog
-- RESTCatalog
-
-### Catalog properties
+## Catalog properties
Gravitino spark connector will transform below property names which are
defined in catalog properties to Spark Iceberg connector configuration.
-#### HiveCatalog
-
-| Gravitino catalog property name | Spark Iceberg connector configuration |
Default Value | Required | Description | Since Version |
-|---------------------------------|---------------------------------------|---------------|----------|---------------------------|---------------|
-| `catalog-backend` | `type` |
`memory` | Yes | Catalog backend type | 0.5.0 |
-| `uri` | `uri` |
(none) | Yes | Catalog backend uri | 0.5.0 |
-| `warehouse` | `warehouse` |
(none) | Yes | Catalog backend warehouse | 0.5.0 |
-
-#### JdbcCatalog
-
-| Gravitino catalog property name | Spark Iceberg connector configuration |
Default Value | Required | Description | Since Version |
-|---------------------------------|---------------------------------------|---------------|----------|---------------------------|---------------|
-| `catalog-backend` | `type` |
`memory` | Yes | Catalog backend type | 0.5.0 |
-| `uri` | `uri` |
(none) | Yes | Catalog backend uri | 0.5.0 |
-| `warehouse` | `warehouse` |
(none) | Yes | Catalog backend warehouse | 0.5.0 |
-| `jdbc-user` | `jdbc.user` |
(none) | Yes | JDBC user name | 0.5.0 |
-| `jdbc-password` | `jdbc.password` |
(none) | Yes | JDBC password | 0.5.0 |
+| Gravitino catalog property name | Spark Iceberg connector configuration |
Description
| Since Version |
+|---------------------------------|---------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|
+| `catalog-backend` | `type` |
Catalog backend type
| 0.5.0 |
+| `uri` | `uri` |
Catalog backend uri
| 0.5.0 |
+| `warehouse` | `warehouse` |
Catalog backend warehouse
| 0.5.0 |
+| `jdbc-user` | `jdbc.user` |
JDBC user name
| 0.5.0 |
+| `jdbc-password` | `jdbc.password` |
JDBC password
| 0.5.0 |
+| `io-impl` | `io-impl` |
The io implementation for `FileIO` in Iceberg.
| 0.6.0 |
+| `s3-endpoint` | `s3.endpoint` | An
alternative endpoint of the S3 service, This could be used for S3FileIO with
any s3-compatible object storage service that has a different endpoint, or
access a private S3 endpoint in a virtual private cloud. | 0.6.0 |
+| `s3-region` | `client.region` |
The region of the S3 service, like `us-west-2`.
| 0.6.0 |
-#### RESTCatalog
-
-| Gravitino catalog property name | Spark Iceberg connector configuration |
Default Value | Required | Description | Since Version |
-|---------------------------------|---------------------------------------|---------------|----------|---------------------------|---------------|
-| `catalog-backend` | `type` |
`memory` | Yes | Catalog backend type | 0.5.1 |
-| `uri` | `uri` |
(none) | Yes | Catalog backend uri | 0.5.1 |
-| `warehouse` | `warehouse` |
(none) | No | Catalog backend warehouse | 0.5.1 |
-
-Gravitino catalog property names with the prefix `spark.bypass.` are passed to
Spark Iceberg connector. For example, using `spark.bypass.io-impl` to pass the
`io-impl` to the Spark Iceberg connector.
+Gravitino catalog property names with the prefix `spark.bypass.` are passed to
Spark Iceberg connector. For example, using `spark.bypass.clients` to pass the
`clients` to the Spark Iceberg connector.
:::info
Iceberg catalog property `cache-enabled` is setting to `false` internally and
not allowed to change.
:::
+## Storage
+
+### S3
+
+You need to add s3 secret to the Spark configuration using
`spark.sql.catalog.${iceberg_catalog_name}.s3.access-key-id` and
`spark.sql.catalog.${iceberg_catalog_name}.s3.secret-access-key`. Additionally,
download the [Iceberg AWS
bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws-bundle)
and place it in the classpath of Spark.
\ No newline at end of file
diff --git a/gradle/libs.versions.toml b/gradle/libs.versions.toml
index 2a28d4792..d9d0ab3cc 100644
--- a/gradle/libs.versions.toml
+++ b/gradle/libs.versions.toml
@@ -17,7 +17,6 @@
# under the License.
#
[versions]
-aws = "2.26.20"
junit = "5.8.1"
protoc = "3.24.4"
jackson = "2.15.2"
@@ -93,8 +92,6 @@ sun-activation-version = "1.2.0"
error-prone = "3.1.0"
[libraries]
-aws-s3 = { group = "software.amazon.awssdk", name = "s3", version.ref = "aws" }
-aws-sts = { group = "software.amazon.awssdk", name = "sts", version.ref =
"aws" }
protobuf-java = { group = "com.google.protobuf", name = "protobuf-java",
version.ref = "protoc" }
protobuf-java-util = { group = "com.google.protobuf", name =
"protobuf-java-util", version.ref = "protoc" }
jackson-databind = { group = "com.fasterxml.jackson.core", name =
"jackson-databind", version.ref = "jackson" }
diff --git a/iceberg/iceberg-common/build.gradle.kts
b/iceberg/iceberg-common/build.gradle.kts
index be7542b5d..f01b61515 100644
--- a/iceberg/iceberg-common/build.gradle.kts
+++ b/iceberg/iceberg-common/build.gradle.kts
@@ -31,8 +31,6 @@ dependencies {
implementation(project(":server-common"))
implementation(libs.bundles.iceberg)
implementation(libs.bundles.log4j)
- implementation(libs.aws.s3)
- implementation(libs.aws.sts)
implementation(libs.caffeine)
implementation(libs.commons.lang3)
implementation(libs.guava)