[iceberg] 01/01: Docs: Update AWS integration docs to use iceberg-aws-bundle

etudenhoefner Thu, 10 Aug 2023 01:28:20 -0700

This is an automated email from the ASF dual-hosted git repository.

etudenhoefner pushed a commit to branch aws-integration-docs
in repository https://gitbox.apache.org/repos/asf/iceberg.git


commit 43043f9f5d02e8b25e6d9c36cedbd153745d95a8
Author: Eduard Tudenhoefner <[email protected]>
AuthorDate: Thu Aug 10 10:27:38 2023 +0200

    Docs: Update AWS integration docs to use iceberg-aws-bundle
---
 docs/aws.md | 43 +++++++------------------------------------
 1 file changed, 7 insertions(+), 36 deletions(-)

diff --git a/docs/aws.md b/docs/aws.md
index ecabffe77e..51b28b7344 100644
--- a/docs/aws.md
+++ b/docs/aws.md
@@ -49,25 +49,11 @@ Here are some examples.
 
 ### Spark
 
-For example, to use AWS features with Spark 3.3 (with scala 2.12) and AWS 
clients version 2.20.18, you can start the Spark SQL shell with:
+For example, to use AWS features with Spark 3.4 (with scala 2.12) and AWS 
clients version 2.20.18 (which is packaged in the `iceberg-aws-bundle`), you 
can start the Spark SQL shell with:
 
 ```sh
-# add Iceberg dependency
-ICEBERG_VERSION={{% icebergVersion %}}
-DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:$ICEBERG_VERSION"
-
-# add AWS dependency
-AWS_SDK_VERSION=2.20.18
-AWS_MAVEN_GROUP=software.amazon.awssdk
-AWS_PACKAGES=(
-    "bundle"
-)
-for pkg in "${AWS_PACKAGES[@]}"; do
-    DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION"
-done
-
 # start Spark SQL client shell
-spark-sql --packages $DEPENDENCIES \
+spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:{{% 
icebergVersion %}},org.apache.iceberg:iceberg-aws-bundle:{{% icebergVersion %}} 
\
     --conf spark.sql.defaultCatalog=my_catalog \
     --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
     --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix 
\
@@ -75,7 +61,7 @@ spark-sql --packages $DEPENDENCIES \
     --conf 
spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
 ```
 
-As you can see, In the shell command, we use `--packages` to specify the 
additional AWS bundle and HTTP client dependencies with their version as 
`2.20.18`.
+As you can see, In the shell command, we use `--packages` to specify the 
additional `iceberg-aws-bundle` that contains all relevant AWS dependencies.
 
 ### Flink
 
@@ -87,21 +73,12 @@ ICEBERG_VERSION={{% icebergVersion %}}
 MAVEN_URL=https://repo1.maven.org/maven2
 ICEBERG_MAVEN_URL=$MAVEN_URL/org/apache/iceberg
 wget 
$ICEBERG_MAVEN_URL/iceberg-flink-runtime/$ICEBERG_VERSION/iceberg-flink-runtime-$ICEBERG_VERSION.jar
-
-# download AWS dependency
-AWS_SDK_VERSION=2.20.18
-AWS_MAVEN_URL=$MAVEN_URL/software/amazon/awssdk
-AWS_PACKAGES=(
-    "bundle"
-)
-for pkg in "${AWS_PACKAGES[@]}"; do
-    wget $AWS_MAVEN_URL/$pkg/$AWS_SDK_VERSION/$pkg-$AWS_SDK_VERSION.jar
-done
+wget 
$ICEBERG_MAVEN_URL/iceberg-aws-bundle/$ICEBERG_VERSION/iceberg-aws-bundle-$ICEBERG_VERSION.jar
 
 # start Flink SQL client shell
 /path/to/bin/sql-client.sh embedded \
     -j iceberg-flink-runtime-$ICEBERG_VERSION.jar \
-    -j bundle-$AWS_SDK_VERSION.jar \
+    -j iceberg-aws-bundle-$ICEBERG_VERSION.jar \
     shell
 ```
 
@@ -562,7 +539,7 @@ The Glue, S3 and DynamoDB clients are then initialized with 
the assume-role cred
 Here is an example to start Spark shell with this client factory:
 
 ```shell
-spark-sql --packages org.apache.iceberg:iceberg-spark-runtime:{{% 
icebergVersion %}},software.amazon.awssdk:bundle:2.20.18 \
+spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:{{% 
icebergVersion %}},org.apache.iceberg:iceberg-aws-bundle:{{% icebergVersion %}} 
\
     --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
     --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix 
\    
     --conf 
spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
 \
@@ -641,22 +618,17 @@ For versions before 6.5.0, you can use a [bootstrap 
action](https://docs.aws.ama
 ```sh
 #!/bin/bash
 
-AWS_SDK_VERSION=2.20.18
 ICEBERG_VERSION={{% icebergVersion %}}
 MAVEN_URL=https://repo1.maven.org/maven2
 ICEBERG_MAVEN_URL=$MAVEN_URL/org/apache/iceberg
-AWS_MAVEN_URL=$MAVEN_URL/software/amazon/awssdk
 # NOTE: this is just an example shared class path between Spark and Flink,
 #  please choose a proper class path for production.
 LIB_PATH=/usr/share/aws/aws-java-sdk/
 
-AWS_PACKAGES=(
-  "bundle"
-)
-
 ICEBERG_PACKAGES=(
   "iceberg-spark-runtime-3.3_2.12"
   "iceberg-flink-runtime"
+  "iceberg-aws-bundle"
 )
 
 install_dependencies () {
@@ -671,7 +643,6 @@ install_dependencies () {
 }
 
 install_dependencies $LIB_PATH $ICEBERG_MAVEN_URL $ICEBERG_VERSION 
"${ICEBERG_PACKAGES[@]}"
-install_dependencies $LIB_PATH $AWS_MAVEN_URL $AWS_SDK_VERSION 
"${AWS_PACKAGES[@]}"
 ```
 
 ### AWS Glue

[iceberg] 01/01: Docs: Update AWS integration docs to use iceberg-aws-bundle

Reply via email to