Edward Seidl created PARQUET-2435:
-------------------------------------
Summary: Clarify behavior of DELTA_BINARY_PACKED encoders/decoders
Key: PARQUET-2435
URL: https://issues.apache.org/jira/browse/PARQUET-2435
Project: Parquet
Issue Type: Improvement
Components: parquet-format
Reporter: Edward Seidl
I brought this issue up on some time ago on the mailing list [1]; in short I
would like to add some clarification to the DELTA_BINARY_PACKED section of
Encodings.md. The issue is that while the specification does not limit the
number of bits that can be used to encode deltas, some readers expect a maximum
of 32 bits for INT32 data, and 64 bits for INT64 data [2]. I propose adding
verbiage to the specification to the effect that while using 33 bits to encode
INT32 data (or 65 for INT64), it is not recommended, and that readers _should_
be able to read such data, but are not required to.
[1] [https://lists.apache.org/thread/2wj88oghc0t6qqj8ojp5p5tf8wg11840]
[2] https://github.com/apache/arrow/issues/20374
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]