[
https://issues.apache.org/jira/browse/PARQUET-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shan Huang updated PARQUET-2077:
--------------------------------
Description:
In the code of
[DeltaBinaryPackingValuesWriter|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/delta/DeltaBinaryPackingValuesWriter.java#L82],
the parameters are always DEFAULT_NUM_BLOCK_VALUES(which is 128) and
DEFAULT_NUM_MINIBLOCKS(which is 4). So if the file is written by parquet-mr,
the number of values in a miniblock is always 32. It is consistent with the
[spec.|https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-encoding-delta_binary_packed--5]
However, the code in
[DeltaBinaryPackingConfig|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/delta/DeltaBinaryPackingConfig.java#L41]
indicates that the number of values in a miniblock must be multiple of 8.
Would it be better if the limitation was changed to 32?
was:
In the code of
[DeltaBinaryPackingValuesWriter|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/delta/DeltaBinaryPackingValuesWriter.java#L82],
the parameters are always DEFAULT_NUM_BLOCK_VALUES(which is 128) and
DEFAULT_NUM_MINIBLOCKS(which is 4). So if the file is written by parquet-mr,
the number of values in a miniblock is always 32. It is consistent with the
[spec.|https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-encoding-delta_binary_packed--5]
However, the code in
[DeltaBinaryPackingConfig|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/delta/DeltaBinaryPackingConfig.java#L41]
indicate that the number of values in a miniblock must be multiple of 8. Would
it be better if the limitation were changed to 32?
> The number of values in a miniblock should be multiple of 32 instead of 8 in
> DeltaBinaryPackingConfig
> -----------------------------------------------------------------------------------------------------
>
> Key: PARQUET-2077
> URL: https://issues.apache.org/jira/browse/PARQUET-2077
> Project: Parquet
> Issue Type: Wish
> Components: parquet-mr
> Reporter: Shan Huang
> Priority: Major
>
> In the code of
> [DeltaBinaryPackingValuesWriter|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/delta/DeltaBinaryPackingValuesWriter.java#L82],
> the parameters are always DEFAULT_NUM_BLOCK_VALUES(which is 128) and
> DEFAULT_NUM_MINIBLOCKS(which is 4). So if the file is written by parquet-mr,
> the number of values in a miniblock is always 32. It is consistent with the
> [spec.|https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-encoding-delta_binary_packed--5]
> However, the code in
> [DeltaBinaryPackingConfig|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/delta/DeltaBinaryPackingConfig.java#L41]
> indicates that the number of values in a miniblock must be multiple of 8.
> Would it be better if the limitation was changed to 32?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)