sungwy opened a new pull request, #16435:
URL: https://github.com/apache/iceberg/pull/16435

   Issue: https://github.com/apache/iceberg/issues/13485
   
   This PR fixes the packed dictionary INT96 timestamp decode path in the 
`VectorizedParquetDefinitionLevelReader` to write using byte offsets rather 
than row indexes.
   
   The unit test constructs a dictionary with expected values and verifies that 
the decoded Arrow buffer contains the expected timestamp values, and verifies 
that writing multiple rows of values did not corrupt the data as a result of 
the values being written into the wrong offsets.
   
   The fix is the same as the earlier proposed change discussed in #13486 , and 
mirrors the offset handling used by other readers in the same class: 
https://github.com/apache/iceberg/blob/8e7ab3c881391487d3676fe31f53077e78f6375b/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/parquet/VectorizedParquetDefinitionLevelReader.java#L302
   
   
   Disclosure:
     - AI-assisted analysis and test drafting were used while investigating 
this issue. The final code and test were reviewed, edited, and validated 
manually. Refer: 
https://iceberg.apache.org/contribute/#guidelines-for-ai-assisted-contributions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to