price-qian opened a new pull request, #5555: URL: https://github.com/apache/iceberg/pull/5555
Compiled iceberg-spark-runtime-3.1 and tested on Glue 3.0. The job is able to read from a DEMO S3 bucket into an S3 bucket that lives in my account. The DEMO bucket is from this blog: https://aws.amazon.com/blogs/big-data/build-an-apache-iceberg-data-lake-using-amazon-athena-amazon-emr-and-aws-glue/ My Spark job used during the testing is listed below. ``` import com.amazonaws.services.glue.GlueContext import org.apache.spark.SparkContext import org.apache.spark.sql.SparkSession object IcebergSparkSQL { def main(sysArgs: Array[String]) { val sparkContext: SparkContext = new SparkContext() val spark: SparkSession = SparkSession.builder. config("spark.sql.catalog.demo", "org.apache.iceberg.spark.SparkCatalog"). config("spark.sql.catalog.demo.warehouse", "s3://iceberg-test-puzhen/glueiceberg"). config("spark.sql.catalog.demo.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog"). config("spark.sql.catalog.demo.s3.acceleration-enabled", "true"). getOrCreate() // create database spark.sql("CREATE DATABASE IF NOT EXISTS demo.reviews") // load data (Amazon reviews public dataset) val book_reviews_location = "s3://amazon-reviews-pds/parquet/product_category=Books/*.parquet" val book_reviews = spark.read.parquet(book_reviews_location) book_reviews.writeTo("demo.reviews.book_reviews2"). tableProperty("format-version", "2"). createOrReplace() // read using SQL spark.sql("SELECT * FROM demo.reviews.book_reviews2").show() } } ``` Job runtime duration: Workload run without S3 acceleration: 3 minutes 21 seconds Workload run with S3 acceleration: 3 minutes 14 seconds -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
