Hi all I found this page via Google when searching for a description of the parquet binary format: https://parquet.apache.org/docs/file-format/data-pages/. This page suggests that definition levels are written before repetition levels.
However, after experimenting with parquet files generated by pandas and pyarrow and perusing the arrow source code (especially InitializeLevelDecoders in https://github.com/apache/arrow/blob/main/cpp/src/parquet/column_reader.cc), I strongly believe that repetition levels are written before definition levels. I also found this other documentation of parquet format that has repetition levels before definition levels https://github.com/apache/parquet-format. The content of the parquet.apache.org/docs site appears to be tracked on Github under https://github.com/apache/parquet-site. Is the documentation content still being actively updated? Has there been an effort to synchronize the format descriptions under apache/parquet-site with those under apache/parquet-format? Kind regards Kaili