JonasJ-ap commented on code in PR #7119:
URL: https://github.com/apache/iceberg/pull/7119#discussion_r1137904028
##########
docs/aws.md:
##########
@@ -60,7 +60,7 @@ AWS_SDK_VERSION=2.20.18
AWS_MAVEN_GROUP=software.amazon.awssdk
AWS_PACKAGES=(
"bundle"
- "url-connection-client"
+ "apache-client"
Review Comment:
It seems that `apache-client` is included in the `bundle`. I ran
```bash
aws_jars jar -tf bundle-2.20.18.jar | egrep "http/apache"
```
and got
```bash
...
software/amazon/awssdk/http/apache/ProxyConfiguration$Builder.class
software/amazon/awssdk/http/apache/ProxyConfiguration$DefaultClientProxyConfigurationBuilder.class
software/amazon/awssdk/http/apache/ProxyConfiguration$1.class
software/amazon/awssdk/http/apache/ProxyConfiguration.class
software/amazon/awssdk/http/apache/ApacheSdkHttpService.class
software/amazon/awssdk/http/apache/ApacheHttpClient$1.class
software/amazon/awssdk/http/apache/ApacheHttpClient$Builder.class
software/amazon/awssdk/http/apache/ApacheHttpClient$DefaultBuilder.class
software/amazon/awssdk/http/apache/ApacheHttpClient$ApacheConnectionManagerFactory$1.class
software/amazon/awssdk/http/apache/ApacheHttpClient$ApacheConnectionManagerFactory.class
software/amazon/awssdk/http/apache/ApacheHttpClient.class
```
I also verified that using the following script
```sh
BRANCH_NAME=s3_credentials
DEPENDENCIES=""
# add AWS dependnecy
AWS_SDK_VERSION=2.17.257
AWS_MAVEN_GROUP=software.amazon.awssdk
AWS_PACKAGES=(
"bundle"
)
for pkg in "${AWS_PACKAGES[@]}"; do
DEPENDENCIES+="$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION,"
done
JARS="iceberg-spark-runtime-3.3_$BRANCH_NAME.jar"
# start Spark SQL client shell
spark-shell --packages=$DEPENDENCIES --jars=$JARS\
--conf
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
\
--conf spark.sql.catalog.demo=org.apache.iceberg.spark.SparkCatalog \
--conf
spark.sql.catalog.demo.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
--conf spark.sql.catalog.demo.io-impl=org.apache.iceberg.aws.s3.S3FileIO
\
--conf spark.sql.catalog.demo.warehouse=s3://gluetestjonas/warehouse \
--conf spark.sql.catalog.demo.http-client.type=apache \
--conf spark.sql.catalog.demo.client.region=us-east-1
```
I could successfully spawn a spark shell and create/write data to tables in
AWS Glue
##########
python/dev/Dockerfile:
##########
@@ -47,12 +47,12 @@ RUN curl -s
https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runt
&& mv iceberg-spark-runtime-3.3_2.12-1.1.0.jar /opt/spark/jars
# Download Java AWS SDK
-RUN curl -s
https://repo1.maven.org/maven2/software/amazon/awssdk/bundle/2.17.165/bundle-2.17.165.jar
-Lo bundle-2.17.165.jar \
- && mv bundle-2.17.165.jar /opt/spark/jars
+RUN curl -s
https://repo1.maven.org/maven2/software/amazon/awssdk/bundle/2.20.8/bundle-2.20.8.jar
-Lo bundle-2.20.8.jar \
+ && mv bundle-2.20.8.jar /opt/spark/jars
# Download URL connection client required for S3FileIO
-RUN curl -s
https://repo1.maven.org/maven2/software/amazon/awssdk/url-connection-client/2.17.165/url-connection-client-2.17.165.jar
-Lo url-connection-client-2.17.165.jar \
- && mv url-connection-client-2.17.165.jar /opt/spark/jars
+RUN curl -s
https://repo1.maven.org/maven2/software/amazon/awssdk/apache-client/2.20.8/apache-client-2.20.8.jar
-Lo apache-client-2.20.8.jar \
+ && mv apache-client-2.20.8.jar /opt/spark/jars
Review Comment:
I think we may want to still include `url-connection` here since this docker
file uses Apache Iceberg 1.1.0 release instead of the master branch.
https://github.com/apache/iceberg/blob/08078571b7560c39da2fe087db8a873920a4ba78/python/dev/Dockerfile#L45-L48
(probably the reason that current python integration fails)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]