[GitHub] [orc] stiga-huang commented on a diff in pull request #1375: ORC-1356: [C++] Use Intel AVX-512 instructions to accelerate the Rle-bit-packing decode

via GitHub Tue, 28 Mar 2023 06:35:28 -0700


stiga-huang commented on code in PR #1375:
URL: https://github.com/apache/orc/pull/1375#discussion_r1150600749



##########
README.md:
##########
@@ -93,3 +93,16 @@ To build only the C++ library:
 % make test-out
 
 ```
+
+To build the C++ library with AVX512 enabled:
+```shell
+Cmake option BUILD_ENABLE_AVX512 can be set to "ON" or (default value)"OFF" at 
the compile time. At compile time, it defines the SIMD level(AVX512) to be 
compiled into the binaries.
+Environment variable ORC_USER_SIMD_LEVEL can be set to "AVX512" or (default 
value)"NONE" at the run time. At run time, it defines the SIMD level to 
dispatch the code which can apply SIMD optimization. 
+Note that if ORC_USER_SIMD_LEVEL is set to "NONE" at run time, AVX512 will not 
take effect at run time even if BUILD_ENABLE_AVX512 is set to "ON" at compile 
time.

Review Comment:
   These 3 lines are too long and would be better to move outside the shell 
section. E.g.
   
   To build the C++ library with AVX512 enabled:
   ```shell
   export ORC_USER_SIMD_LEVEL=AVX512
   % mkdir build
   % cd build
   % cmake .. -DBUILD_JAVA=OFF -DBUILD_ENABLE_AVX512=ON
   % make package
   % make test-out
   ```
   Cmake option BUILD_ENABLE_AVX512 can be set to "ON" or (default value)"OFF" 
at the compile time. At compile time, it defines the SIMD level(AVX512) to be 
compiled into the binaries.
   
   Environment variable ORC_USER_SIMD_LEVEL can be set to "AVX512" or (default 
value)"NONE" at the run time. At run time, it defines the SIMD level to 
dispatch the code which can apply SIMD optimization. 
   Note that if ORC_USER_SIMD_LEVEL is set to "NONE" at run time, AVX512 will 
not take effect at run time even if BUILD_ENABLE_AVX512 is set to "ON" at 
compile time.



##########
c++/src/BpackingAvx512.cc:
##########
@@ -0,0 +1,4476 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "BpackingAvx512.hh"
+#include "BitUnpackerAvx512.hh"
+#include "CpuInfoUtil.hh"
+#include "RLEv2.hh"
+
+namespace orc {
+  UnpackAvx512::UnpackAvx512(RleDecoderV2* dec) : decoder(dec), 
unpackDefault(UnpackDefault(dec)) {
+    // PASS
+  }
+
+  UnpackAvx512::~UnpackAvx512() {
+    // PASS
+  }
+
+  void UnpackAvx512::vectorUnpack1(int64_t* data, uint64_t offset, uint64_t 
len) {
+    uint32_t bitWidth = 1;
+    const uint8_t* srcPtr = reinterpret_cast<const 
uint8_t*>(decoder->bufferStart);
+    uint32_t numElements = 0;
+    int64_t* dstPtr = data + offset;
+    uint64_t bufMoveByteLen = 0;
+    uint64_t bufRestByteLen = decoder->bufferEnd - decoder->bufferStart;
+    bool resetBuf = false;
+    uint64_t startBit = 0;
+    uint64_t tailBitLen = 0;
+    uint32_t backupByteLen = 0;
+
+    while (len > 0) {
+      if (startBit != 0) {
+        bufMoveByteLen +=
+            moveLen(len * bitWidth + startBit - ORC_VECTOR_BYTE_WIDTH, 
ORC_VECTOR_BYTE_WIDTH);
+      } else {
+        bufMoveByteLen += moveLen(len * bitWidth, ORC_VECTOR_BYTE_WIDTH);
+      }
+
+      if (bufMoveByteLen <= bufRestByteLen) {
+        numElements = len;
+        resetBuf = false;
+        len -= numElements;
+      } else {
+        if (startBit != 0) {
+          numElements =
+              (bufRestByteLen * ORC_VECTOR_BYTE_WIDTH + ORC_VECTOR_BYTE_WIDTH 
- startBit) /
+              bitWidth;
+          len -= numElements;
+          tailBitLen = fmod(
+              bufRestByteLen * ORC_VECTOR_BYTE_WIDTH + ORC_VECTOR_BYTE_WIDTH - 
startBit, bitWidth);
+          resetBuf = true;
+        } else {
+          numElements = (bufRestByteLen * ORC_VECTOR_BYTE_WIDTH) / bitWidth;
+          len -= numElements;
+          tailBitLen = fmod(bufRestByteLen * ORC_VECTOR_BYTE_WIDTH, bitWidth);
+          resetBuf = true;
+        }
+      }
+
+      if (tailBitLen != 0) {
+        backupByteLen = tailBitLen / ORC_VECTOR_BYTE_WIDTH;
+        tailBitLen = 0;
+      }
+
+      if (startBit > 0) {
+        uint32_t align = getAlign(startBit, bitWidth, 8);
+        if (align > numElements) {
+          align = numElements;
+        }
+        if (align != 0) {
+          bufMoveByteLen -=
+              moveLen(align * bitWidth + startBit - ORC_VECTOR_BYTE_WIDTH, 
ORC_VECTOR_BYTE_WIDTH);
+          plainUnpackLongs(dstPtr, 0, align, bitWidth, startBit);
+          srcPtr = reinterpret_cast<const uint8_t*>(decoder->bufferStart);
+          bufRestByteLen = decoder->bufferEnd - decoder->bufferStart;
+          dstPtr += align;
+          numElements -= align;
+        }
+      }

Review Comment:
   Codes of line 46 to 93 are similar (or identical?) in these methods, e.g. 
the same codes appear in vectorUnpack2 and vectorUnpack3. Can we extract them 
to reduce the method size?
   
   I think the core of these methods are the while-loop of `while (numElements 
>= VECTOR_UNPACK_8BIT_MAX_NUM)`. It'd be nice to refactor the codes dealing 
with boundaries and left these methods focus on this while-loop.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [orc] stiga-huang commented on a diff in pull request #1375: ORC-1356: [C++] Use Intel AVX-512 instructions to accelerate the Rle-bit-packing decode

Reply via email to