Hey Antoine,

Thanks for raising this. In Iceberg we also use the 1 MiB page size:

https://github.com/apache/iceberg/blob/b3c25fb7608934d975a054b353823ca001ca3742/core/src/main/java/org/apache/iceberg/TableProperties.java#L133

Kind regards,
Fokko

Op do 23 mei 2024 om 10:06 schreef Antoine Pitrou <anto...@python.org>:

>
> Hello,
>
> The Parquet format itself (or at least the README) recommends a 8 kiB
> page size, suggesting that data pages are the unit of computation.
>
> However, Parquet C++ has long chosen a 1 MiB page size by default (*),
> suggesting that data pages are considered as the unit of IO there.
>
> (*) even bumping it to 64 MiB at some point, perhaps by mistake:
>
> https://github.com/apache/arrow/commit/4078b876e0cc7503f4da16693ce7901a6ae503d3
>
> What are the typical choices in other writers?
>
> Regards
>
> Antoine.
>
>
>

Reply via email to