eric-maynard opened a new pull request, #13450:
URL: https://github.com/apache/iceberg/pull/13450

   During the implementation of new Parquet encodings (e.g. #13391) I've 
noticed that we rely on generating Parquet data at test time. For some 
encodings, such as DELTA_BYTE_ARRAY, that is complicated by the fact that 
there's not a good way to reliably tell the writer to use a particular encoding 
for a particular field. 
   
   To address this gap, this PR introduces a new test `testGoldenFiles` along 
with several pre-generated Parquet files written using various encodings. I 
intend to add more files/encodings here as support for new encodings is 
introduced.
   
   I generated these files using [this small 
util](https://github.com/eric-maynard/parquet-golden-files) and manually 
validated the encodings with `parquet-tools`, e.g.:
   
   ```
   $ parquet-tools inspect --detail 
~/iceberg/spark/v4.0/spark/src/test/resources/encodings/RLE/int32.parquet 
   FileMetaData
   . . .
   ■■■■■■■■■■■■■■■■■■■■■■■■encodings = list
   ■■■■■■■■■■■■■■■■■■■■■■■■■■■■0
   ■■■■■■■■■■■■■■■■■■■■■■■■■■■■3
   ■■■■■■■■■■■■■■■■■■■■■■■■■■■■8
   . . .
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to