This is an automated email from the ASF dual-hosted git repository.
deshanxiao pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/main by this push:
new 83879811e ORC-642: update PatchedBase doc with patch ceiling in spec
83879811e is described below
commit 83879811e31f1c2538fe532ef1d0fb59f21e8ec4
Author: Jefffrey <[email protected]>
AuthorDate: Wed Apr 10 18:25:36 2024 +0800
ORC-642: update PatchedBase doc with patch ceiling in spec
### What changes were proposed in this pull request?
Update PatchedBase specification doc to include details about the behaviour
of padding the patch gap + patch width bits to nearest fixed btis.
### Why are the changes needed?
Ensure spec is accurate to implementation details
### How was this patch tested?
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #1868 from Jefffrey/ORC-642.
Authored-by: Jefffrey <[email protected]>
Signed-off-by: deshanxiao <[email protected]>
---
site/specification/ORCv1.md | 23 ++++++++++++++++++++---
site/specification/ORCv2.md | 23 ++++++++++++++++++++---
2 files changed, 40 insertions(+), 6 deletions(-)
diff --git a/site/specification/ORCv1.md b/site/specification/ORCv1.md
index d29280903..9aede7a4a 100644
--- a/site/specification/ORCv1.md
+++ b/site/specification/ORCv1.md
@@ -824,14 +824,31 @@ the index values and the additional value bits.
bit is set, the entire value is negated.
* Data values (W * L bits padded to the byte) - A sequence of W bit positive
values that are added to the base value.
-* Patch list (PLL * (PGW + PW) bytes) - A list of patches for values
- that didn't fit within W bits. Each entry in the list consists of a
+* Patch list (PLL * closestFixedBits(PGW + PW) bits) - A list of patches for
+ values that didn't fit within W bits. Each entry in the list consists of a
gap, which is the number of elements skipped from the previous
patch, and a patch value. Patches are applied by logically or'ing
the data values with the relevant patch shifted W bits left. If a
patch is 0, it was introduced to skip over more than 255 items. The
combined length of each patch (PGW + PW) must be less or equal to
- 64.
+ 64. (PGW + PW) is padded to the closest fixed bit size according to the
+ below table before being encoded in the patch list.
+
+(PGW + PW) | closestFixedBits(PGW + PW)
+:------------ | :-------------
+1 <= x <= 24 | x
+25 | 26
+26 | 26
+27 | 28
+28 | 28
+29 | 30
+30 | 30
+31 | 32
+32 | 32
+33 <= x <= 40 | 40
+41 <= x <= 48 | 48
+49 <= x <= 56 | 56
+57 <= x <= 64 | 64
The unsigned sequence of [2030, 2000, 2020, 1000000, 2040, 2050, 2060, 2070,
2080, 2090, 2100, 2110, 2120, 2130, 2140, 2150, 2160, 2170, 2180, 2190]
diff --git a/site/specification/ORCv2.md b/site/specification/ORCv2.md
index 64fccaf8f..2e0c35462 100644
--- a/site/specification/ORCv2.md
+++ b/site/specification/ORCv2.md
@@ -843,14 +843,31 @@ the index values and the additional value bits.
bit is set, the entire value is negated.
* Data values (W * L bits padded to the byte) - A sequence of W bit positive
values that are added to the base value.
-* Patch list (PLL * (PGW + PW) bytes) - A list of patches for values
- that didn't fit within W bits. Each entry in the list consists of a
+* Patch list (PLL * closestFixedBits(PGW + PW) bits) - A list of patches for
+ values that didn't fit within W bits. Each entry in the list consists of a
gap, which is the number of elements skipped from the previous
patch, and a patch value. Patches are applied by logically or'ing
the data values with the relevant patch shifted W bits left. If a
patch is 0, it was introduced to skip over more than 255 items. The
combined length of each patch (PGW + PW) must be less or equal to
- 64.
+ 64. (PGW + PW) is padded to the closest fixed bit size according to the
+ below table before being encoded in the patch list.
+
+(PGW + PW) | closestFixedBits(PGW + PW)
+:------------ | :-------------
+1 <= x <= 24 | x
+25 | 26
+26 | 26
+27 | 28
+28 | 28
+29 | 30
+30 | 30
+31 | 32
+32 | 32
+33 <= x <= 40 | 40
+41 <= x <= 48 | 48
+49 <= x <= 56 | 56
+57 <= x <= 64 | 64
The unsigned sequence of [2030, 2000, 2020, 1000000, 2040, 2050, 2060, 2070,
2080, 2090, 2100, 2110, 2120, 2130, 2140, 2150, 2160, 2170, 2180, 2190]