This is an automated email from the ASF dual-hosted git repository.

deshanxiao pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/orc.git


The following commit(s) were added to refs/heads/main by this push:
     new 83879811e ORC-642: update PatchedBase doc with patch ceiling in spec
83879811e is described below

commit 83879811e31f1c2538fe532ef1d0fb59f21e8ec4
Author: Jefffrey <[email protected]>
AuthorDate: Wed Apr 10 18:25:36 2024 +0800

    ORC-642: update PatchedBase doc with patch ceiling in spec
    
    ### What changes were proposed in this pull request?
    
    Update PatchedBase specification doc to include details about the behaviour 
of padding the patch gap + patch width bits to nearest fixed btis.
    
    ### Why are the changes needed?
    
    Ensure spec is accurate to implementation details
    
    ### How was this patch tested?
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #1868 from Jefffrey/ORC-642.
    
    Authored-by: Jefffrey <[email protected]>
    Signed-off-by: deshanxiao <[email protected]>
---
 site/specification/ORCv1.md | 23 ++++++++++++++++++++---
 site/specification/ORCv2.md | 23 ++++++++++++++++++++---
 2 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/site/specification/ORCv1.md b/site/specification/ORCv1.md
index d29280903..9aede7a4a 100644
--- a/site/specification/ORCv1.md
+++ b/site/specification/ORCv1.md
@@ -824,14 +824,31 @@ the index values and the additional value bits.
   bit is set, the entire value is negated.
 * Data values (W * L bits padded to the byte) - A sequence of W bit positive
   values that are added to the base value.
-* Patch list (PLL * (PGW + PW) bytes) - A list of patches for values
-  that didn't fit within W bits. Each entry in the list consists of a
+* Patch list (PLL * closestFixedBits(PGW + PW) bits) - A list of patches for
+  values that didn't fit within W bits. Each entry in the list consists of a
   gap, which is the number of elements skipped from the previous
   patch, and a patch value. Patches are applied by logically or'ing
   the data values with the relevant patch shifted W bits left. If a
   patch is 0, it was introduced to skip over more than 255 items. The
   combined length of each patch (PGW + PW) must be less or equal to
-  64.
+  64. (PGW + PW) is padded to the closest fixed bit size according to the
+  below table before being encoded in the patch list.
+
+(PGW + PW)    | closestFixedBits(PGW + PW)
+:------------ | :-------------
+1 <= x <= 24  | x
+25            | 26
+26            | 26
+27            | 28
+28            | 28
+29            | 30
+30            | 30
+31            | 32
+32            | 32
+33 <= x <= 40 | 40
+41 <= x <= 48 | 48
+49 <= x <= 56 | 56
+57 <= x <= 64 | 64
 
 The unsigned sequence of [2030, 2000, 2020, 1000000, 2040, 2050, 2060, 2070,
 2080, 2090, 2100, 2110, 2120, 2130, 2140, 2150, 2160, 2170, 2180, 2190]
diff --git a/site/specification/ORCv2.md b/site/specification/ORCv2.md
index 64fccaf8f..2e0c35462 100644
--- a/site/specification/ORCv2.md
+++ b/site/specification/ORCv2.md
@@ -843,14 +843,31 @@ the index values and the additional value bits.
   bit is set, the entire value is negated.
 * Data values (W * L bits padded to the byte) - A sequence of W bit positive
   values that are added to the base value.
-* Patch list (PLL * (PGW + PW) bytes) - A list of patches for values
-  that didn't fit within W bits. Each entry in the list consists of a
+* Patch list (PLL * closestFixedBits(PGW + PW) bits) - A list of patches for
+  values that didn't fit within W bits. Each entry in the list consists of a
   gap, which is the number of elements skipped from the previous
   patch, and a patch value. Patches are applied by logically or'ing
   the data values with the relevant patch shifted W bits left. If a
   patch is 0, it was introduced to skip over more than 255 items. The
   combined length of each patch (PGW + PW) must be less or equal to
-  64.
+  64. (PGW + PW) is padded to the closest fixed bit size according to the
+  below table before being encoded in the patch list.
+
+(PGW + PW)    | closestFixedBits(PGW + PW)
+:------------ | :-------------
+1 <= x <= 24  | x
+25            | 26
+26            | 26
+27            | 28
+28            | 28
+29            | 30
+30            | 30
+31            | 32
+32            | 32
+33 <= x <= 40 | 40
+41 <= x <= 48 | 48
+49 <= x <= 56 | 56
+57 <= x <= 64 | 64
 
 The unsigned sequence of [2030, 2000, 2020, 1000000, 2040, 2050, 2060, 2070,
 2080, 2090, 2100, 2110, 2120, 2130, 2140, 2150, 2160, 2170, 2180, 2190]

Reply via email to