hackintoshrao opened a new pull request, #686:
URL: https://github.com/apache/iceberg-go/pull/686

   ## Summary
   
   Parquet decimals can be stored using multiple physical types depending on 
precision:
   - `INT32` for precision <= 9
   - `INT64` for precision <= 18  
   - `FIXED_LEN_BYTE_ARRAY` for any precision
   - `BYTE_ARRAY` for any precision
   
   The previous implementation only accepted `FIXED_LEN_BYTE_ARRAY` for all 
decimals and rejected valid parquet files with error:
   
   ```
   unexpected physical type INT32 for decimal(7, 2), expected 
FIXED_LEN_BYTE_ARRAY
   ```
   
   This caused `AddFiles` to fail when importing datasets (like TPC-DS) that 
use INT32/INT64 for small precision decimals, which is valid per the Parquet 
specification.
   
   ## Changes
   
   - Refactors `createStatsAgg` to switch on Iceberg logical type first, then 
handle physical representations (matches iceberg-java's 
`ParquetConversions.java` approach)
   - For `DecimalType`, accepts all valid parquet physical types
   - Updates `DataFileStatsFromMeta` to handle INT32/INT64 decimal statistics
   - Adds `wrappedDecByteArrayStats` for BYTE_ARRAY encoded decimals
   
   ## Test plan
   
   - [x] Existing tests pass
   - [x] Build succeeds
   - [x] Tested with TPC-DS parquet files that use INT32 decimals


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to