xndai opened a new pull request, #16343:
URL: https://github.com/apache/iceberg/pull/16343

   …motion with INT logical type
   
   Fix ClassCastException: BigIntVector cannot be cast to IntVector when 
reading Parquet files with INT(32, true) logical type annotation after 
promoting a column from int to long.
   
   The vectorized reader's LogicalTypeVisitor now allocates vectors based on 
the Parquet physical type instead of deriving them from the (potentially 
promoted) Iceberg schema type.
   
   Root Cause:
   In VectorizedArrowReader.allocateFieldVector(), the Arrow field was created 
from the Iceberg schema type (which reflects the promoted LongType), producing 
a BigIntVector. The LogicalTypeVisitor then cast this vector to IntVector based 
on the Parquet file's INT(32) logical type, causing the mismatch.
   
   The non-vectorized reader (BaseParquetReaders) already handles this 
correctly by checking the expected Iceberg type and using IntAsLongReader for 
promotion. The vectorized reader relies on the accessor layer for widening 
(IntAccessor.getLong() widens int to long), so the fix ensures the vector 
matches the physical data layout.
   
   Tests:
   - testIntToLongPromotionWithLogicalType: verifies reading after promotion 
when file has INT(32, true) annotation (the reported crash)
   - testIntToLongPromotionWithoutLogicalType: verifies reading after promotion 
when file has bare INT32
   
   Fixes #16341


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to