jatin-bhateja commented on code in PR #1011:
URL: https://github.com/apache/parquet-mr/pull/1011#discussion_r1096553161


##########
parquet-benchmarks/src/main/java/org/apache/parquet/benchmarks/ByteBitPackingVectorBenchmarks.java:
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.benchmarks;
+
+import org.apache.parquet.column.values.bitpacking.BytePacker;
+import org.apache.parquet.column.values.bitpacking.Packer;
+import org.openjdk.jmh.annotations.*;
+
+import java.util.concurrent.TimeUnit;
+
+/**
+ * This class uses the java17 vector API, add VM options 
--add-modules=jdk.incubator.vector
+ */
+
+@State(Scope.Benchmark)
+@BenchmarkMode(Mode.AverageTime)
+@Warmup(iterations = 1, batchSize = 100000)
+@Measurement(iterations = 1, batchSize = 100000)
+@OutputTimeUnit(TimeUnit.MILLISECONDS)
+public class ByteBitPackingVectorBenchmarks {
+
+  /**
+   * The range of bitWidth is 1 ~ 32, change it directly if test other 
bitWidth.
+   */
+  private static final int bitWidth = 7;
+  private static final int outputValues = 1024;
+  private final byte[] input = new byte[outputValues * bitWidth / 8];
+  private final int[] output = new int[outputValues];
+  private final int[] outputVector = new int[outputValues];
+
+  @Setup(Level.Trial)
+  public void getInputBytes() {
+    for (int i = 0; i < input.length; i++) {
+      input[i] = (byte) i;
+    }
+  }
+
+  @Benchmark
+  public void testUnpack() {
+    BytePacker bytePacker = Packer.LITTLE_ENDIAN.newBytePacker(bitWidth);
+    for (int i = 0, j = 0; i < input.length; i += bitWidth, j += 8) {
+      bytePacker.unpack8Values(input, i, output, j);
+    }
+  }
+
+  @Benchmark
+  public void testUnpackVector() {
+    BytePacker bytePacker = Packer.LITTLE_ENDIAN.newBytePacker(bitWidth);
+    BytePacker bytePackerVector = 
Packer.LITTLE_ENDIAN.newBytePackerVector(bitWidth);

Review Comment:
   > Could you elaborate more? @jatin-bhateja
   
   Idea was to emit scalar routines also in vector packer so that user can 
access both scalar and vector routines through one vector packer instance. 
   But this can be addressed later since currently scalar packer routines are 
generated at build time and vector packer routines are hand crafted. Existing 
scalar packer are nested static final classes which makes extending them 
difficult.



##########
parquet-column/src/main/java/org/apache/parquet/column/values/bitpacking/ParquetReadRouter.java:
##########
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.bitpacking;
+
+import org.apache.parquet.bytes.ByteBufferInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.EOFException;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * This is a utils class which is used for big data applications(such as Spark 
Flink)
+ *
+ * - For Intel CPU, Flags avx512vbmi && avx512_vbmi2 can have better 
performance gains
+ */
+public class ParquetReadRouter {
+  private static final Logger LOG = 
LoggerFactory.getLogger(ParquetReadRouter.class);
+
+  private static volatile Boolean vector;
+
+  public static void read(int bitWidth, ByteBufferInputStream in, int 
currentCount, int[] currentBuffer) throws IOException {
+    if (supportVector()) {
+      readBatchVector(bitWidth, in, currentCount, currentBuffer);
+    } else {
+      readBatchVector(bitWidth, in, currentCount, currentBuffer);

Review Comment:
   Else block should have call to readBatch method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to