This is an automated email from the ASF dual-hosted git repository. etudenhoefner pushed a commit to branch aws-integration-docs in repository https://gitbox.apache.org/repos/asf/iceberg.git
commit 43043f9f5d02e8b25e6d9c36cedbd153745d95a8 Author: Eduard Tudenhoefner <[email protected]> AuthorDate: Thu Aug 10 10:27:38 2023 +0200 Docs: Update AWS integration docs to use iceberg-aws-bundle --- docs/aws.md | 43 +++++++------------------------------------ 1 file changed, 7 insertions(+), 36 deletions(-) diff --git a/docs/aws.md b/docs/aws.md index ecabffe77e..51b28b7344 100644 --- a/docs/aws.md +++ b/docs/aws.md @@ -49,25 +49,11 @@ Here are some examples. ### Spark -For example, to use AWS features with Spark 3.3 (with scala 2.12) and AWS clients version 2.20.18, you can start the Spark SQL shell with: +For example, to use AWS features with Spark 3.4 (with scala 2.12) and AWS clients version 2.20.18 (which is packaged in the `iceberg-aws-bundle`), you can start the Spark SQL shell with: ```sh -# add Iceberg dependency -ICEBERG_VERSION={{% icebergVersion %}} -DEPENDENCIES="org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:$ICEBERG_VERSION" - -# add AWS dependency -AWS_SDK_VERSION=2.20.18 -AWS_MAVEN_GROUP=software.amazon.awssdk -AWS_PACKAGES=( - "bundle" -) -for pkg in "${AWS_PACKAGES[@]}"; do - DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION" -done - # start Spark SQL client shell -spark-sql --packages $DEPENDENCIES \ +spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:{{% icebergVersion %}},org.apache.iceberg:iceberg-aws-bundle:{{% icebergVersion %}} \ --conf spark.sql.defaultCatalog=my_catalog \ --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \ --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \ @@ -75,7 +61,7 @@ spark-sql --packages $DEPENDENCIES \ --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO ``` -As you can see, In the shell command, we use `--packages` to specify the additional AWS bundle and HTTP client dependencies with their version as `2.20.18`. +As you can see, In the shell command, we use `--packages` to specify the additional `iceberg-aws-bundle` that contains all relevant AWS dependencies. ### Flink @@ -87,21 +73,12 @@ ICEBERG_VERSION={{% icebergVersion %}} MAVEN_URL=https://repo1.maven.org/maven2 ICEBERG_MAVEN_URL=$MAVEN_URL/org/apache/iceberg wget $ICEBERG_MAVEN_URL/iceberg-flink-runtime/$ICEBERG_VERSION/iceberg-flink-runtime-$ICEBERG_VERSION.jar - -# download AWS dependency -AWS_SDK_VERSION=2.20.18 -AWS_MAVEN_URL=$MAVEN_URL/software/amazon/awssdk -AWS_PACKAGES=( - "bundle" -) -for pkg in "${AWS_PACKAGES[@]}"; do - wget $AWS_MAVEN_URL/$pkg/$AWS_SDK_VERSION/$pkg-$AWS_SDK_VERSION.jar -done +wget $ICEBERG_MAVEN_URL/iceberg-aws-bundle/$ICEBERG_VERSION/iceberg-aws-bundle-$ICEBERG_VERSION.jar # start Flink SQL client shell /path/to/bin/sql-client.sh embedded \ -j iceberg-flink-runtime-$ICEBERG_VERSION.jar \ - -j bundle-$AWS_SDK_VERSION.jar \ + -j iceberg-aws-bundle-$ICEBERG_VERSION.jar \ shell ``` @@ -562,7 +539,7 @@ The Glue, S3 and DynamoDB clients are then initialized with the assume-role cred Here is an example to start Spark shell with this client factory: ```shell -spark-sql --packages org.apache.iceberg:iceberg-spark-runtime:{{% icebergVersion %}},software.amazon.awssdk:bundle:2.20.18 \ +spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:{{% icebergVersion %}},org.apache.iceberg:iceberg-aws-bundle:{{% icebergVersion %}} \ --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \ --conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \ --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \ @@ -641,22 +618,17 @@ For versions before 6.5.0, you can use a [bootstrap action](https://docs.aws.ama ```sh #!/bin/bash -AWS_SDK_VERSION=2.20.18 ICEBERG_VERSION={{% icebergVersion %}} MAVEN_URL=https://repo1.maven.org/maven2 ICEBERG_MAVEN_URL=$MAVEN_URL/org/apache/iceberg -AWS_MAVEN_URL=$MAVEN_URL/software/amazon/awssdk # NOTE: this is just an example shared class path between Spark and Flink, # please choose a proper class path for production. LIB_PATH=/usr/share/aws/aws-java-sdk/ -AWS_PACKAGES=( - "bundle" -) - ICEBERG_PACKAGES=( "iceberg-spark-runtime-3.3_2.12" "iceberg-flink-runtime" + "iceberg-aws-bundle" ) install_dependencies () { @@ -671,7 +643,6 @@ install_dependencies () { } install_dependencies $LIB_PATH $ICEBERG_MAVEN_URL $ICEBERG_VERSION "${ICEBERG_PACKAGES[@]}" -install_dependencies $LIB_PATH $AWS_MAVEN_URL $AWS_SDK_VERSION "${AWS_PACKAGES[@]}" ``` ### AWS Glue
