Zoltan Ivanfi created PARQUET-1190:
--------------------------------------
Summary: Use the same default page size across different language
bindings
Key: PARQUET-1190
URL: https://issues.apache.org/jira/browse/PARQUET-1190
Project: Parquet
Issue Type: Task
Reporter: Zoltan Ivanfi
Currently there are many different page size recommandations/defaults in use:
* [parquet-format|https://github.com/apache/parquet-format#configurations]
recommends 8 KB.
*
[parquet-mr|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L46]
uses 1 MB.
*
[Impala|https://github.com/apache/impala/blob/daff8eb0ca19aa612c9fc7cc2ddd647735b31266/be/src/exec/hdfs-parquet-table-writer.h#L83]
uses 64 KB.
These values (and other language bindings not listed above) should be
consistent.
To pick a sensible new value, we may need to do some measurements. Because of
this, we shall wait for column indexes to be implemented before picking a new
value.
The new default page size does not necessarily have to be a single value any
more, we have several options:
* A single default page size, as before.
* Different page size defaults depending on the type.
* Using a specified number of values instead of data size (e.g., every page
contains 10000 values).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)