[ 
https://issues.apache.org/jira/browse/PARQUET-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shan Huang updated PARQUET-2077:
--------------------------------
    Description: 
In the code of 
[DeltaBinaryPackingValuesWriter|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/delta/DeltaBinaryPackingValuesWriter.java#L82],
 the parameters are always DEFAULT_NUM_BLOCK_VALUES(which is 128) and 
DEFAULT_NUM_MINIBLOCKS(which is 4). So if the file is written by parquet-mr, 
the number of values in a miniblock is always 32. It is consistent with the 
[spec.|https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-encoding-delta_binary_packed--5]
 However, the code in 
[DeltaBinaryPackingConfig|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/delta/DeltaBinaryPackingConfig.java#L41]
 indicates that the number of values in a miniblock must be multiple of 8. 
Would it be better if the limitation was changed to 32?

  was:
In the code of 
[DeltaBinaryPackingValuesWriter|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/delta/DeltaBinaryPackingValuesWriter.java#L82],
 the parameters are always DEFAULT_NUM_BLOCK_VALUES(which is 128) and 
DEFAULT_NUM_MINIBLOCKS(which is 4). So if the file is written by parquet-mr, 
the number of values in a miniblock is always 32. It is consistent with the 
[spec.|https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-encoding-delta_binary_packed--5]
However, the code in 
[DeltaBinaryPackingConfig|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/delta/DeltaBinaryPackingConfig.java#L41]
 indicate that the number of values in a miniblock must be multiple of 8. Would 
it be better if the limitation were changed to 32?


> The number of values in a miniblock should be multiple of 32 instead of 8 in 
> DeltaBinaryPackingConfig
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-2077
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2077
>             Project: Parquet
>          Issue Type: Wish
>          Components: parquet-mr
>            Reporter: Shan Huang
>            Priority: Major
>
> In the code of 
> [DeltaBinaryPackingValuesWriter|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/delta/DeltaBinaryPackingValuesWriter.java#L82],
>  the parameters are always DEFAULT_NUM_BLOCK_VALUES(which is 128) and 
> DEFAULT_NUM_MINIBLOCKS(which is 4). So if the file is written by parquet-mr, 
> the number of values in a miniblock is always 32. It is consistent with the 
> [spec.|https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-encoding-delta_binary_packed--5]
>  However, the code in 
> [DeltaBinaryPackingConfig|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/delta/DeltaBinaryPackingConfig.java#L41]
>  indicates that the number of values in a miniblock must be multiple of 8. 
> Would it be better if the limitation was changed to 32?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to