Shan Huang created PARQUET-2077:
-----------------------------------

             Summary: The number of values in a miniblock should be multiple of 
32 instead of 8 in DeltaBinaryPackingConfig
                 Key: PARQUET-2077
                 URL: https://issues.apache.org/jira/browse/PARQUET-2077
             Project: Parquet
          Issue Type: Wish
          Components: parquet-mr
            Reporter: Shan Huang


In the code of 
[DeltaBinaryPackingValuesWriter|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/delta/DeltaBinaryPackingValuesWriter.java#L82],
 the parameters are always DEFAULT_NUM_BLOCK_VALUES(which is 128) and 
DEFAULT_NUM_MINIBLOCKS(which is 4). So if the file is written by parquet-mr, 
the number of values in a miniblock is always 32. It is consistent with the 
[spec.|https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-encoding-delta_binary_packed--5]
However, the code in 
[DeltaBinaryPackingConfig|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/delta/DeltaBinaryPackingConfig.java#L41]
 indicate that the number of values in a miniblock must be multiple of 8. Would 
it be better if the limitation were changed to 32?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to