hackintoshrao opened a new pull request, #686: URL: https://github.com/apache/iceberg-go/pull/686
## Summary Parquet decimals can be stored using multiple physical types depending on precision: - `INT32` for precision <= 9 - `INT64` for precision <= 18 - `FIXED_LEN_BYTE_ARRAY` for any precision - `BYTE_ARRAY` for any precision The previous implementation only accepted `FIXED_LEN_BYTE_ARRAY` for all decimals and rejected valid parquet files with error: ``` unexpected physical type INT32 for decimal(7, 2), expected FIXED_LEN_BYTE_ARRAY ``` This caused `AddFiles` to fail when importing datasets (like TPC-DS) that use INT32/INT64 for small precision decimals, which is valid per the Parquet specification. ## Changes - Refactors `createStatsAgg` to switch on Iceberg logical type first, then handle physical representations (matches iceberg-java's `ParquetConversions.java` approach) - For `DecimalType`, accepts all valid parquet physical types - Updates `DataFileStatsFromMeta` to handle INT32/INT64 decimal statistics - Adds `wrappedDecByteArrayStats` for BYTE_ARRAY encoded decimals ## Test plan - [x] Existing tests pass - [x] Build succeeds - [x] Tested with TPC-DS parquet files that use INT32 decimals -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
