JonasJ-ap commented on code in PR #7119:
URL: https://github.com/apache/iceberg/pull/7119#discussion_r1137904028


##########
docs/aws.md:
##########
@@ -60,7 +60,7 @@ AWS_SDK_VERSION=2.20.18
 AWS_MAVEN_GROUP=software.amazon.awssdk
 AWS_PACKAGES=(
     "bundle"
-    "url-connection-client"
+    "apache-client"

Review Comment:
   It seems that `apache-client` is included in the `bundle`. I ran
   ```bash
   aws_jars jar -tf bundle-2.20.18.jar | egrep "http/apache"
   ```
   and got
   ```bash
   ...
   software/amazon/awssdk/http/apache/ProxyConfiguration$Builder.class
   
software/amazon/awssdk/http/apache/ProxyConfiguration$DefaultClientProxyConfigurationBuilder.class
   software/amazon/awssdk/http/apache/ProxyConfiguration$1.class
   software/amazon/awssdk/http/apache/ProxyConfiguration.class
   software/amazon/awssdk/http/apache/ApacheSdkHttpService.class
   software/amazon/awssdk/http/apache/ApacheHttpClient$1.class
   software/amazon/awssdk/http/apache/ApacheHttpClient$Builder.class
   software/amazon/awssdk/http/apache/ApacheHttpClient$DefaultBuilder.class
   
software/amazon/awssdk/http/apache/ApacheHttpClient$ApacheConnectionManagerFactory$1.class
   
software/amazon/awssdk/http/apache/ApacheHttpClient$ApacheConnectionManagerFactory.class
   software/amazon/awssdk/http/apache/ApacheHttpClient.class
   ```
   I also verified that using the following script
   ```sh
   BRANCH_NAME=s3_credentials
   DEPENDENCIES=""
   
   # add AWS dependnecy
   AWS_SDK_VERSION=2.17.257
   AWS_MAVEN_GROUP=software.amazon.awssdk
   AWS_PACKAGES=(
       "bundle"
   )
   for pkg in "${AWS_PACKAGES[@]}"; do
       DEPENDENCIES+="$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION,"
   done
   
   JARS="iceberg-spark-runtime-3.3_$BRANCH_NAME.jar"
   
   # start Spark SQL client shell
   spark-shell --packages=$DEPENDENCIES --jars=$JARS\
       --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
       --conf spark.sql.catalog.demo=org.apache.iceberg.spark.SparkCatalog \
       --conf 
spark.sql.catalog.demo.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
       --conf spark.sql.catalog.demo.io-impl=org.apache.iceberg.aws.s3.S3FileIO 
\
       --conf spark.sql.catalog.demo.warehouse=s3://gluetestjonas/warehouse \
       --conf spark.sql.catalog.demo.http-client.type=apache \
       --conf spark.sql.catalog.demo.client.region=us-east-1
   ```
   I could successfully spawn a spark shell and create/write data to tables in 
AWS Glue



##########
python/dev/Dockerfile:
##########
@@ -47,12 +47,12 @@ RUN curl -s 
https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runt
  && mv iceberg-spark-runtime-3.3_2.12-1.1.0.jar /opt/spark/jars
 
 # Download Java AWS SDK
-RUN curl -s 
https://repo1.maven.org/maven2/software/amazon/awssdk/bundle/2.17.165/bundle-2.17.165.jar
 -Lo bundle-2.17.165.jar \
- && mv bundle-2.17.165.jar /opt/spark/jars
+RUN curl -s 
https://repo1.maven.org/maven2/software/amazon/awssdk/bundle/2.20.8/bundle-2.20.8.jar
 -Lo bundle-2.20.8.jar \
+ && mv bundle-2.20.8.jar /opt/spark/jars
 
 # Download URL connection client required for S3FileIO
-RUN curl -s 
https://repo1.maven.org/maven2/software/amazon/awssdk/url-connection-client/2.17.165/url-connection-client-2.17.165.jar
 -Lo url-connection-client-2.17.165.jar \
- && mv url-connection-client-2.17.165.jar /opt/spark/jars
+RUN curl -s 
https://repo1.maven.org/maven2/software/amazon/awssdk/apache-client/2.20.8/apache-client-2.20.8.jar
 -Lo apache-client-2.20.8.jar \
+ && mv apache-client-2.20.8.jar /opt/spark/jars

Review Comment:
   I think we may want to still include `url-connection` here since this docker 
file uses Apache Iceberg 1.1.0 release instead of the master branch.
   
https://github.com/apache/iceberg/blob/08078571b7560c39da2fe087db8a873920a4ba78/python/dev/Dockerfile#L45-L48
   
   (probably the reason that current python integration fails)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to