[orc] branch master updated: ORC-464: [C++] avoid computing zigzag values for DELTA and SHORT_REPEAT encoding.

gangwu Mon, 25 Feb 2019 12:07:51 -0800

This is an automated email from the ASF dual-hosted git repository.

gangwu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/orc.git



The following commit(s) were added to refs/heads/master by this push:
     new bce06ee  ORC-464: [C++] avoid computing zigzag values for DELTA and 
SHORT_REPEAT encoding.
bce06ee is described below

commit bce06eee103ef3d8e63b3a0fd9b05928eb39f48a
Author: Fang Zheng <[email protected]>
AuthorDate: Mon Feb 25 11:59:46 2019 -0800

    ORC-464: [C++] avoid computing zigzag values for DELTA and SHORT_REPEAT 
encoding.
    
    Fixes #361
    
    Signed-off-by: Gang Wu <[email protected]>
---
 c++/src/RleEncoderV2.cc | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/c++/src/RleEncoderV2.cc b/c++/src/RleEncoderV2.cc
index 51cf4c9..f3b6269 100644
--- a/c++/src/RleEncoderV2.cc
+++ b/c++/src/RleEncoderV2.cc
@@ -302,14 +302,16 @@ void RleEncoderV2::preparePatchedBlob(EncodingOption& 
option) {
 }
 
 void RleEncoderV2::determineEncoding(EncodingOption& option) {
-    // we need to compute zigzag values for DIRECT encoding if we decide to
-    // break early for delta overflows or for shorter runs
-    computeZigZagLiterals(option);
-
-    option.zzBits100p = percentileBits(zigzagLiterals, 0, numLiterals, 1.0);
+    // We need to compute zigzag values for DIRECT and PATCHED_BASE encodings,
+    // but not for SHORT_REPEAT or DELTA. So we only perform the zigzag
+    // computation when it's determined to be necessary.
 
     // not a big win for shorter runs to determine encoding
     if (numLiterals <= MIN_REPEAT) {
+        // we need to compute zigzag values for DIRECT encoding if we decide to
+        // break early for delta overflows or for shorter runs
+        computeZigZagLiterals(option);
+        option.zzBits100p = percentileBits(zigzagLiterals, 0, numLiterals, 
1.0);
         option.encoding = DIRECT;
         return;
     }
@@ -349,6 +351,8 @@ void RleEncoderV2::determineEncoding(EncodingOption& 
option) {
     // PATCHED_BASE condition as encoding using DIRECT is faster and has less
     // overhead than PATCHED_BASE
     if (!isSafeSubtract(max, option.min)) {
+        computeZigZagLiterals(option);
+        option.zzBits100p = percentileBits(zigzagLiterals, 0, numLiterals, 
1.0);
         option.encoding = DIRECT;
         return;
     }
@@ -404,6 +408,8 @@ void RleEncoderV2::determineEncoding(EncodingOption& 
option) {
     // beyond a threshold then we need to patch the values. if the variation
     // is not significant then we can use direct encoding
 
+    computeZigZagLiterals(option);
+    option.zzBits100p = percentileBits(zigzagLiterals, 0, numLiterals, 1.0);
     option.zzBits90p = percentileBits(zigzagLiterals, 0, numLiterals, 0.9, 
true);
     uint32_t diffBitsLH = option.zzBits100p - option.zzBits90p;

[orc] branch master updated: ORC-464: [C++] avoid computing zigzag values for DELTA and SHORT_REPEAT encoding.

Reply via email to