Hey Antoine, Thanks for raising this. In Iceberg we also use the 1 MiB page size:
https://github.com/apache/iceberg/blob/b3c25fb7608934d975a054b353823ca001ca3742/core/src/main/java/org/apache/iceberg/TableProperties.java#L133 Kind regards, Fokko Op do 23 mei 2024 om 10:06 schreef Antoine Pitrou <anto...@python.org>: > > Hello, > > The Parquet format itself (or at least the README) recommends a 8 kiB > page size, suggesting that data pages are the unit of computation. > > However, Parquet C++ has long chosen a 1 MiB page size by default (*), > suggesting that data pages are considered as the unit of IO there. > > (*) even bumping it to 64 MiB at some point, perhaps by mistake: > > https://github.com/apache/arrow/commit/4078b876e0cc7503f4da16693ce7901a6ae503d3 > > What are the typical choices in other writers? > > Regards > > Antoine. > > >