[PR] [python] Fix read crash after widening column type change on non-partitioned tables [paimon]

via GitHub Mon, 01 Jun 2026 23:38:02 -0700


JunRuiLee opened a new pull request, #8073:
URL: https://github.com/apache/paimon/pull/8073


   ## Problem
   
   Reading a non-partitioned table after a widening column type change (e.g.
   `INT -> BIGINT`) crashes with an Arrow schema mismatch:
   
   ```
   ArrowInvalid: Schema at index 1 was different:
   user_id: int32   (file written before the type change)
   vs
   user_id: int64   (file written after)
   ```
   
   When a table has no partition keys and the read needs no column reordering,
   `DataFileBatchReader` returns the format reader's batch as-is, so columns 
from
   older-schema files keep their original physical types and fail to concatenate
   with newer-schema batches.
   
   The reorder/partition-padding path already aligns types via
   `RecordBatch.from_arrays(..., schema=...)`, which is why partitioned tables 
work
   and this gap went unnoticed.
   
   ## Fix
   
   Apply the same type alignment on the no-rebuild fast path, so both paths
   materialize types consistently. The batch is only rebuilt when types actually
   differ, keeping the common non-evolution read zero-copy.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [python] Fix read crash after widening column type change on non-partitioned tables [paimon]

Reply via email to