kevinjqliu commented on issue #1449:
URL: 
https://github.com/apache/iceberg-python/issues/1449#issuecomment-2557247096

   I have not tested this personally but from reading the AWS blog on connect 
Spark to AWS Glue Iceberg REST catalog, there are some configurations that are 
different from what I would expect. 
   
https://aws.amazon.com/blogs/big-data/read-and-write-s3-iceberg-table-using-aws-glue-iceberg-rest-catalog-from-open-source-apache-spark/
   
   ```
   import sys
   import os
   import time
   from pyspark.sql import SparkSession
   
   #Replace <aws_region> with AWS region name.
   #Replace <aws_account_id> with AWS account ID.
   
   spark = SparkSession.builder.appName('osspark') \
   .config('spark.jars.packages', 
'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160,software.amazon.awssdk:url-connection-client:2.20.160')
 \
   .config('spark.sql.extensions', 
'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
   .config('spark.sql.defaultCatalog', 'spark_catalog') \
   .config('spark.sql.catalog.spark_catalog', 
'org.apache.iceberg.spark.SparkCatalog') \
   .config('spark.sql.catalog.spark_catalog.type', 'rest') \
   
.config('spark.sql.catalog.spark_catalog.uri','https://glue.<aws_region>.amazonaws.com/iceberg')
 \
   .config('spark.sql.catalog.spark_catalog.warehouse','<aws_account_id>') \
   .config('spark.sql.catalog.spark_catalog.rest.sigv4-enabled','true') \
   .config('spark.sql.catalog.spark_catalog.rest.signing-name','glue') \
   .config('spark.sql.catalog.spark_catalog.rest.signing-region', <aws_region>) 
\
   
.config('spark.sql.catalog.spark_catalog.io-impl','org.apache.iceberg.aws.s3.S3FileIO')
 \
   
.config('spark.hadoop.fs.s3a.aws.credentials.provider','org.apache.hadoop.fs.s3a.SimpleAWSCredentialProvider')
 \
   
.config('spark.sql.catalog.spark_catalog.rest-metrics-reporting-enabled','false')
 \
   .getOrCreate()
   ```
   
   Specifically, notice the `warehouse` parameter. It might help trying to 
replicate what this spark config is doing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to