Hi all

I found this page via Google when searching for a description of the parquet 
binary format: https://parquet.apache.org/docs/file-format/data-pages/. This 
page suggests that definition levels are written before repetition levels.

However, after experimenting with parquet files generated by pandas and pyarrow 
and perusing the arrow source code (especially InitializeLevelDecoders in 
https://github.com/apache/arrow/blob/main/cpp/src/parquet/column_reader.cc), I 
strongly believe that repetition levels are written before definition levels. I 
also found this other documentation of parquet format that has repetition 
levels before definition levels https://github.com/apache/parquet-format.

The content of the parquet.apache.org/docs site appears to be tracked on Github 
under https://github.com/apache/parquet-site. Is the documentation content 
still being actively updated? Has there been an effort to synchronize the 
format descriptions under apache/parquet-site with those under 
apache/parquet-format?

Kind regards

Kaili

Reply via email to