[ 
https://issues.apache.org/jira/browse/ORC-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707987#comment-16707987
 ] 

ASF GitHub Bot commented on ORC-444:
------------------------------------

wgtmac closed pull request #345: ORC-444: Fix errors in RLE section in ORC spec 
and improve RLEV2 encoder code.
URL: https://github.com/apache/orc/pull/345
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/c++/src/CMakeLists.txt b/c++/src/CMakeLists.txt
index 72d408d9b0..235ced856a 100644
--- a/c++/src/CMakeLists.txt
+++ b/c++/src/CMakeLists.txt
@@ -199,6 +199,7 @@ set(SOURCE_FILES
   OrcFile.cc
   Reader.cc
   RLEv1.cc
+  RLEV2Util.cc
   RleDecoderV2.cc
   RleEncoderV2.cc
   RLE.cc
diff --git a/c++/src/RLEV2Util.cc b/c++/src/RLEV2Util.cc
new file mode 100644
index 0000000000..53d18a0bd1
--- /dev/null
+++ b/c++/src/RLEV2Util.cc
@@ -0,0 +1,29 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with option work for additional information
+ * regarding copyright ownership.  The ASF licenses option file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use option file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "RLEV2Util.hh"
+
+namespace orc {
+
+  // Map FBS enum to bit width value.
+  const uint32_t FBSToBitWidthMap[FixedBitSizes::SIZE] = {
+    1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 
22, 23, 24,
+    26, 28, 30, 32, 40, 48, 56, 64
+  };
+
+}
diff --git a/c++/src/RLEV2Util.hh b/c++/src/RLEV2Util.hh
index a7bc5537ab..794d5f62ab 100644
--- a/c++/src/RLEV2Util.hh
+++ b/c++/src/RLEV2Util.hh
@@ -22,26 +22,10 @@
 #include "RLEv2.hh"
 
 namespace orc {
+  extern const uint32_t FBSToBitWidthMap[FixedBitSizes::SIZE];
+
   inline uint32_t decodeBitWidth(uint32_t n) {
-    if (n <= FixedBitSizes::TWENTYFOUR) {
-      return n + 1;
-    } else if (n == FixedBitSizes::TWENTYSIX) {
-      return 26;
-    } else if (n == FixedBitSizes::TWENTYEIGHT) {
-      return 28;
-    } else if (n == FixedBitSizes::THIRTY) {
-      return 30;
-    } else if (n == FixedBitSizes::THIRTYTWO) {
-      return 32;
-    } else if (n == FixedBitSizes::FORTY) {
-      return 40;
-    } else if (n == FixedBitSizes::FORTYEIGHT) {
-      return 48;
-    } else if (n == FixedBitSizes::FIFTYSIX) {
-      return 56;
-    } else {
-      return 64;
-    }
+    return FBSToBitWidthMap[n];
   }
 
   inline uint32_t getClosestFixedBits(uint32_t n) {
diff --git a/site/specification/ORCv0.md b/site/specification/ORCv0.md
index 613298a086..336896ed77 100644
--- a/site/specification/ORCv0.md
+++ b/site/specification/ORCv0.md
@@ -438,7 +438,7 @@ values.
 * Run - a sequence of at least 3 identical values
 * Literals - a sequence of non-identical values
 
-The first byte of each group of values is a header than determines
+The first byte of each group of values is a header that determines
 whether it is a run (value between 0 to 127) or literal list (value
 between -128 to -1). For runs, the control byte is the length of the
 run minus the length of the minimal run (3) and the control byte for
diff --git a/site/specification/ORCv1.md b/site/specification/ORCv1.md
index 78350800e7..b799adc6bd 100644
--- a/site/specification/ORCv1.md
+++ b/site/specification/ORCv1.md
@@ -444,7 +444,7 @@ values.
 * Run - a sequence of at least 3 identical values
 * Literals - a sequence of non-identical values
 
-The first byte of each group of values is a header than determines
+The first byte of each group of values is a header that determines
 whether it is a run (value between 0 to 127) or literal list (value
 between -128 to -1). For runs, the control byte is the length of the
 run minus the length of the minimal run (3) and the control byte for
@@ -622,7 +622,7 @@ if the series is increasing or decreasing.
   * 9 bits for run length (L) (1 to 512 values)
 * Base value - encoded as (signed or unsigned) varint
 * Delta base - encoded as signed varint
-* Delta values $W * (L - 2)$ bytes - encode each delta after the first
+* Delta values (W * (L - 2)) bytes - encode each delta after the first
   one. If the delta base is positive, the sequence is increasing and if it is
   negative the sequence is decreasing.
 
diff --git a/site/specification/ORCv2.md b/site/specification/ORCv2.md
index 79f930e0ca..eb8b106ab5 100644
--- a/site/specification/ORCv2.md
+++ b/site/specification/ORCv2.md
@@ -463,7 +463,7 @@ values.
 * Run - a sequence of at least 3 identical values
 * Literals - a sequence of non-identical values
 
-The first byte of each group of values is a header than determines
+The first byte of each group of values is a header that determines
 whether it is a run (value between 0 to 127) or literal list (value
 between -128 to -1). For runs, the control byte is the length of the
 run minus the length of the minimal run (3) and the control byte for
@@ -641,7 +641,7 @@ if the series is increasing or decreasing.
   * 9 bits for run length (L) (1 to 512 values)
 * Base value - encoded as (signed or unsigned) varint
 * Delta base - encoded as signed varint
-* Delta values $W * (L - 2)$ bytes - encode each delta after the first
+* Delta values (W * (L - 2)) bytes - encode each delta after the first
   one. If the delta base is positive, the sequence is increasing and if it is
   negative the sequence is decreasing.
 


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Fix errors in RLE section in ORC spec and improve RLEV2 encoder code.
> ---------------------------------------------------------------------
>
>                 Key: ORC-444
>                 URL: https://issues.apache.org/jira/browse/ORC-444
>             Project: ORC
>          Issue Type: Improvement
>          Components: C++, documentation
>            Reporter: Fang Zheng
>            Priority: Minor
>
> 1. Fixed a few typos and format errors in the RLE section in ORC 
> specification documentation.
> 2. Improve decodeBitWidth() function in RLEV2Util.hh.
> Please see PR for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to