Re: [PR] Spark: Fix type mismatch in SPJ with bucket partition key on string column [iceberg]

via GitHub Thu, 28 May 2026 03:46:52 -0700


ammarchalifah commented on PR #16424:
URL: https://github.com/apache/iceberg/pull/16424#issuecomment-4563261811


   Reproduced a test suite to mimic the behaviour with vanilla Spark & Hadoop 
catalog. The error message is slightly different, but the gist of the issue is 
the same
   
   ```
   java.lang.IllegalStateException: Unknown type for int field. Type name: 
java.lang.String
           at 
org.apache.iceberg.spark.source.StructInternalRow.getInt(StructInternalRow.java:138)
   ```
   
   Happens both in `SELECT` & `MERGE INTO` where join keys are subset of 
partition keys. Partition keys are `identity(a), bucket(b), bucket(c)`, while 
join keys are `b` & `c`.
   
   When partition columns are reorganized e.g. `bucket(b), bucket(c), 
identity(a)`, the test suite passed, suggesting the issue lies in ordinal 
matching between Iceberg's partition ordering vs Spark's expectation. 
@RussellSpitzer seems like you're on point that this is not an Iceberg bug, but 
Spark-side bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark: Fix type mismatch in SPJ with bucket partition key on string column [iceberg]

Reply via email to