sungwy opened a new pull request, #16435: URL: https://github.com/apache/iceberg/pull/16435
Issue: https://github.com/apache/iceberg/issues/13485 This PR fixes the packed dictionary INT96 timestamp decode path in the `VectorizedParquetDefinitionLevelReader` to write using byte offsets rather than row indexes. The unit test constructs a dictionary with expected values and verifies that the decoded Arrow buffer contains the expected timestamp values, and verifies that writing multiple rows of values did not corrupt the data as a result of the values being written into the wrong offsets. The fix is the same as the earlier proposed change discussed in #13486 , and mirrors the offset handling used by other readers in the same class: https://github.com/apache/iceberg/blob/8e7ab3c881391487d3676fe31f53077e78f6375b/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedParquetDefinitionLevelReader.java#L302 Disclosure: - AI-assisted analysis and test drafting were used while investigating this issue. The final code and test were reviewed, edited, and validated manually. Refer: https://iceberg.apache.org/contribute/#guidelines-for-ai-assisted-contributions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
