Shan Huang created PARQUET-2077:
-----------------------------------
Summary: The number of values in a miniblock should be multiple of
32 instead of 8 in DeltaBinaryPackingConfig
Key: PARQUET-2077
URL: https://issues.apache.org/jira/browse/PARQUET-2077
Project: Parquet
Issue Type: Wish
Components: parquet-mr
Reporter: Shan Huang
In the code of
[DeltaBinaryPackingValuesWriter|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/delta/DeltaBinaryPackingValuesWriter.java#L82],
the parameters are always DEFAULT_NUM_BLOCK_VALUES(which is 128) and
DEFAULT_NUM_MINIBLOCKS(which is 4). So if the file is written by parquet-mr,
the number of values in a miniblock is always 32. It is consistent with the
[spec.|https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-encoding-delta_binary_packed--5]
However, the code in
[DeltaBinaryPackingConfig|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/delta/DeltaBinaryPackingConfig.java#L41]
indicate that the number of values in a miniblock must be multiple of 8. Would
it be better if the limitation were changed to 32?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)