ammarchalifah opened a new pull request, #16424:
URL: https://github.com/apache/iceberg/pull/16424

   ### Problem
   
   When a table is partitioned by `bucket(N, string_column)`, the bucket 
transform produces an `Integer` partition value. During Storage Partitioned 
Joins (SPJ), Spark reads partition values through `StructInternalRow`, which 
calls `struct.get(ordinal, CharSequence.class)` in `getUTF8StringInternal()`. 
This assumes the value is always a `CharSequence`, causing a 
`ClassCastException`:
   
   ```
   IllegalArgumentException: Wrong class, expected java.lang.CharSequence, but 
was java.lang.Integer, for object: 1
   ```
   
   
   This affects any SPJ query (e.g. `MERGE INTO` or `JOIN`) on tables 
partitioned
   with `bucket(N, string_column)`.
   
   ### Fix
   
   Changed `getUTF8StringInternal()` to use `struct.get(ordinal, Object.class)` 
instead of `struct.get(ordinal, CharSequence.class)`, then call 
`value.toString()`. This follows the same pattern already used by 
`getBinaryInternal()` in the same class, which uses `Object.class` to handle 
multiple possible runtime types.
   
   The fix is applied to all Spark versions: 3.4, 3.5, 4.0, and 4.1.
   
   ### Testing
   
   - Added `testJoinsWithBucketingOnStringColumn` using the existing 
`checkJoin` helper to cover bucket-only partitioning on string columns.
   - Added `testJoinsWithIdentityAndBucketOnStringColumn` as a targeted 
regression test for the exact scenario from the issue: identity + bucket 
partitioning on a string column with an SPJ join.
   
   Both tests are added consistently across all 4 Spark versions.
   
   ### Notes
   
   AI tools were used to assist with drafting this change. I have reviewed and
   validated the logic, tests, and code style end-to-end.
   
   Closes #15349


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to