(orc) branch main updated: MINOR: Fix Patched Base doc in specification

dongjoon Tue, 09 Jul 2024 12:17:56 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/orc.git



The following commit(s) were added to refs/heads/main by this push:
     new cccbe7253 MINOR: Fix Patched Base doc in specification
cccbe7253 is described below

commit cccbe7253717c740fd1ed40d094e7496abc2329e
Author: Jefffrey <[email protected]>
AuthorDate: Tue Jul 9 12:17:23 2024 -0700

    MINOR: Fix Patched Base doc in specification
    
    ### What changes were proposed in this pull request?
    
    Fix patched base specification to state that only 5% of values are patched, 
not 10%
    
    ### Why are the changes needed?
    
    According to implementation:
    
    
https://github.com/apache/orc/blob/0828c2ff114f30c84e4a23fd42ed58c6615c6f97/java/core/src/java/org/apache/orc/impl/RunLengthIntegerWriterV2.java#L535-L550
    
    - Also 10% of 512 doesn't fit in max patch list length of 31
    
    Also fix some formatting issues.
    
    Before:
    
    
![image](https://github.com/apache/orc/assets/22608443/69849f63-94f5-4da3-8338-70ef1dbc9ef5)
    
    After:
    
    
![image](https://github.com/apache/orc/assets/22608443/747cf944-9b3a-4367-b4f5-b6d8b2364f17)
    
    ### How was this patch tested?
    
    N/A
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #1948 from Jefffrey/patched-base-doc-fix.
    
    Authored-by: Jefffrey <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 site/specification/ORCv1.md | 8 ++++----
 site/specification/ORCv2.md | 8 ++++----
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/site/specification/ORCv1.md b/site/specification/ORCv1.md
index 9aede7a4a..dffbf9034 100644
--- a/site/specification/ORCv1.md
+++ b/site/specification/ORCv1.md
@@ -804,8 +804,8 @@ length of 4 (3) as [0x5e, 0x03, 0x5c, 0xa1, 0xab, 0x1e, 
0xde, 0xad,
 The patched base encoding is used for integer sequences whose bit
 widths varies a lot. The minimum signed value of the sequence is found
 and subtracted from the other values. The bit width of those adjusted
-values is analyzed and the 90 percentile of the bit width is chosen
-as W. The 10\% of values larger than W use patches from a patch list
+values is analyzed and the 95 percentile of the bit width is chosen
+as W. The 5% of values larger than W use patches from a patch list
 to set the additional bits. Patches are encoded as a list of gaps in
 the index values and the additional value bits.
 
@@ -830,8 +830,8 @@ the index values and the additional value bits.
   patch, and a patch value. Patches are applied by logically or'ing
   the data values with the relevant patch shifted W bits left. If a
   patch is 0, it was introduced to skip over more than 255 items. The
-  combined length of each patch (PGW + PW) must be less or equal to
-  64. (PGW + PW) is padded to the closest fixed bit size according to the
+  combined length of each patch (PGW + PW) must be less or equal to 64.
+  (PGW + PW) is padded to the closest fixed bit size according to the
   below table before being encoded in the patch list.
 
 (PGW + PW)    | closestFixedBits(PGW + PW)
diff --git a/site/specification/ORCv2.md b/site/specification/ORCv2.md
index 2e0c35462..0c773990c 100644
--- a/site/specification/ORCv2.md
+++ b/site/specification/ORCv2.md
@@ -823,8 +823,8 @@ length of 4 (3) as [0x5e, 0x03, 0x5c, 0xa1, 0xab, 0x1e, 
0xde, 0xad,
 The patched base encoding is used for integer sequences whose bit
 widths varies a lot. The minimum signed value of the sequence is found
 and subtracted from the other values. The bit width of those adjusted
-values is analyzed and the 90 percentile of the bit width is chosen
-as W. The 10\% of values larger than W use patches from a patch list
+values is analyzed and the 95 percentile of the bit width is chosen
+as W. The 5% of values larger than W use patches from a patch list
 to set the additional bits. Patches are encoded as a list of gaps in
 the index values and the additional value bits.
 
@@ -849,8 +849,8 @@ the index values and the additional value bits.
   patch, and a patch value. Patches are applied by logically or'ing
   the data values with the relevant patch shifted W bits left. If a
   patch is 0, it was introduced to skip over more than 255 items. The
-  combined length of each patch (PGW + PW) must be less or equal to
-  64. (PGW + PW) is padded to the closest fixed bit size according to the
+  combined length of each patch (PGW + PW) must be less or equal to 64.
+  (PGW + PW) is padded to the closest fixed bit size according to the
   below table before being encoded in the patch list.
 
 (PGW + PW)    | closestFixedBits(PGW + PW)

(orc) branch main updated: MINOR: Fix Patched Base doc in specification

Reply via email to