HelloJowet opened a new issue, #1724: URL: https://github.com/apache/sedona/issues/1724
## Expected behavior Data should be successfully inserted into the Iceberg table without serialisation errors when using Sedona and Iceberg. ## Actual behavior The `INSERT INTO` operation fails with a Kryo serialisation exception. The error trace indicates an `IndexOutOfBoundsException` in the Kryo serializer while handling Iceberg's `GenericDataFile` and `SparkWrite.TaskCommit` objects. Error message: > py4j.protocol.Py4JJavaError: An error occurred while calling o55.sql. : org.apache.spark.SparkException: Job aborted due to stage failure: Exception while getting task result: com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index 44 out of bounds for length 14 Serialization trace: partitionType (org.apache.iceberg.GenericDataFile) taskFiles (org.apache.iceberg.spark.source.SparkWrite$TaskCommit) writerCommitMessage (org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTaskResult) ## Steps to reproduce the problem 1. Configure Sedona and Iceberg with the following settings: ```py from sedona.spark import SedonaContext config = ( SedonaContext.builder() .master('spark://localhost:5581') .config('spark.jars.packages', 'org.apache.sedona:sedona-spark-3.5_2.12:1.7.0,org.datasyslab:geotools-wrapper:1.7.0-28.5,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.1,org.apache.iceberg:iceberg-aws-bundle:1.7.1,org.postgresql:postgresql:42.7.4') .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') .config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') .config('spark.kryo.registrator', 'org.apache.sedona.core.serde.SedonaKryoRegistrator') .config('spark.sql.catalog.my_catalog', 'org.apache.iceberg.spark.SparkCatalog') .config('spark.sql.catalog.my_catalog.type', 'jdbc') .config('spark.sql.catalog.my_catalog.uri', 'jdbc:postgresql://localhost:5500/data_catalog_apache_iceberg') .config('spark.sql.catalog.my_catalog.jdbc.user', 'postgres') .config('spark.sql.catalog.my_catalog.jdbc.password', 'postgres') .config('spark.sql.catalog.my_catalog.io-impl', 'org.apache.iceberg.aws.s3.S3FileIO') .config('spark.sql.catalog.my_catalog.warehouse', 's3a://data-lakehouse') .config('spark.sql.catalog.my_catalog.s3.endpoint', 'http://localhost:5561') .config('spark.sql.catalog.my_catalog.s3.access-key-id', 'admin') .config('spark.sql.catalog.my_catalog.s3.secret-access-key', 'password') .getOrCreate() ) sedona = SedonaContext.create(config) ``` 2. Execute the following queries: ```py sedona.sql('CREATE TABLE my_catalog.table2 (name string) USING iceberg;') sedona.sql("INSERT INTO my_catalog.table2 VALUES ('Alex'), ('Dipankar'), ('Jason')") ``` ## Additional information If I perform the same operations using Spark without Sedona, everything works seamlessly: ```py from pyspark.sql import SparkSession spark = ( SparkSession.builder.master('spark://localhost:5581') .config( 'spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.1,' 'org.apache.iceberg:iceberg-aws-bundle:1.7.1,' 'org.postgresql:postgresql:42.7.4', ) .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') .config('spark.sql.catalog.my_catalog', 'org.apache.iceberg.spark.SparkCatalog') .config('spark.sql.catalog.my_catalog.type', 'jdbc') .config('spark.sql.catalog.my_catalog.uri', 'jdbc:postgresql://localhost:5500/data_catalog_apache_iceberg') .config('spark.sql.catalog.my_catalog.jdbc.user', 'postgres') .config('spark.sql.catalog.my_catalog.jdbc.password', 'postgres') .config('spark.sql.catalog.my_catalog.io-impl', 'org.apache.iceberg.aws.s3.S3FileIO') .config('spark.sql.catalog.my_catalog.warehouse', 's3a://data-lakehouse') .config('spark.sql.catalog.my_catalog.s3.endpoint', 'http://localhost:5561') .config('spark.sql.catalog.my_catalog.s3.access-key-id', 'admin') .config('spark.sql.catalog.my_catalog.s3.secret-access-key', 'password') .getOrCreate() ) spark.sql('CREATE TABLE my_catalog.table8 (name string) USING iceberg;') spark.sql("INSERT INTO my_catalog.table8 VALUES ('Alex'), ('Dipankar'), ('Jason')") ``` ## Settings Sedona version = 1.7.1 Apache Spark version = 3.5 API type = Python Scala version = 2.12 JRE version = 11.0.25 Python version = 3.12.0 Environment = Standalone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org