[PR] fix: Cast smaller integer types to int32/int64 on write for Spark compatibility [iceberg-python]

via GitHub Mon, 08 Dec 2025 09:38:20 -0800


somasays opened a new pull request, #2799:
URL: https://github.com/apache/iceberg-python/pull/2799


   ## Rationale
   
   Fixes #2791
   
   When writing PyArrow tables with smaller integer types (uint8, int8, int16, 
uint16)
   to Iceberg tables with IntegerType columns, PyIceberg preserves the original 
Arrow
   type in the Parquet file. This causes Spark to fail with:
   
   ```
   java.lang.UnsupportedOperationException: Unsupported logical type: UINT_8
   ```
   
   The fix casts smaller integer types to their canonical Iceberg representation
   (int32 for IntegerType, int64 for LongType) during write, ensuring 
cross-platform
   compatibility.
   
   ## Changes
   
   - Added integer type widening logic in 
`ArrowProjectionVisitor._cast_if_needed()` 
     following the same pattern as existing timestamp handling
   - Only widening conversions are allowed (e.g., uint8 → int32, int32 → int64)
   - Narrowing conversions continue to be rejected via the existing `promote()` 
function
   
   ## Testing
   
   - Added parameterized tests for integer type casting 
(`test__to_requested_schema_integer_promotion`)
   - Verified existing `test_projection_filter_add_column_demote` still works 
(narrowing rejection)
   - All 3041 tests pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] fix: Cast smaller integer types to int32/int64 on write for Spark compatibility [iceberg-python]

Reply via email to