Zoltan Ivanfi created PARQUET-1190:
--------------------------------------

             Summary: Use the same default page size across different language 
bindings
                 Key: PARQUET-1190
                 URL: https://issues.apache.org/jira/browse/PARQUET-1190
             Project: Parquet
          Issue Type: Task
            Reporter: Zoltan Ivanfi


Currently there are many different page size recommandations/defaults in use:
* [parquet-format|https://github.com/apache/parquet-format#configurations] 
recommends 8 KB.
* 
[parquet-mr|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L46]
 uses 1 MB.
* 
[Impala|https://github.com/apache/impala/blob/daff8eb0ca19aa612c9fc7cc2ddd647735b31266/be/src/exec/hdfs-parquet-table-writer.h#L83]
 uses 64 KB.

These values (and other language bindings not listed above) should be 
consistent.

To pick a sensible new value, we may need to do some measurements. Because of 
this, we shall wait for column indexes to be implemented before picking a new 
value.

The new default page size does not necessarily have to be a single value any 
more, we have several options:
* A single default page size, as before.
* Different page size defaults depending on the type.
* Using a specified number of values instead of data size (e.g., every page 
contains 10000 values).




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to