This is an automated email from the ASF dual-hosted git repository.
apitrou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git
The following commit(s) were added to refs/heads/master by this push:
new f65d4e1 PARQUET-2435: Clarify behavior of DELTA_BINARY_PACKED
encoding (#231)
f65d4e1 is described below
commit f65d4e19a00955cc7b964c418708750055dde9d1
Author: Ed Seidl <[email protected]>
AuthorDate: Wed Feb 28 03:35:08 2024 -0800
PARQUET-2435: Clarify behavior of DELTA_BINARY_PACKED encoding (#231)
Address the issue of using more bits in the encoding than are used in
the underlying type being encoded.
---
Encodings.md | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/Encodings.md b/Encodings.md
index aaf7a36..5040094 100644
--- a/Encodings.md
+++ b/Encodings.md
@@ -245,7 +245,9 @@ Subtractions in steps 1) and 2) may incur signed arithmetic
overflow, and so
will the corresponding additions when decoding. Overflow should be allowed
and handled as wrapping around in 2's complement notation so that the original
values are correctly restituted. This may require explicit care in some
programming
-languages (for example by doing all arithmetic in the unsigned domain).
+languages (for example by doing all arithmetic in the unsigned domain). Writers
+must not use more bits when bit packing the miniblock data than would be
required
+to PLAIN encode the physical type (e.g. INT32 data must not use more than 32
bits).
The following examples use 8 as the block size to keep the examples short,
but in real cases it would be invalid.