Hi Dear parquet Team,
I am Intel SW engineer, We did optimization in Parquet bit-packing en/decode
with jdk.incubator.vector in Open JDK18 which bring prominent performance
improvement.
Not sure we can commit our optimization into Parquet-mr community?
Due to Vector API is added to OpenJDK since 16, So this optimization request
JDK16 or higher.
Below are ours test results
Functional test is based on open-source parquet-mr Bit-pack decoding function:
public final void unpack8Values(final byte[] in, final int inPos, final int[]
out, final int outPos)
compared with our implementation with vector API public final void
unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final int
outPos)
We tested 10 pairs (open source parquet bit unpacking vs ours optimized
vectorized SIMD implementation) decode function with bit
width={1,2,3,4,5,6,7,8,9,10}, below are test results:
[cid:[email protected]]
We integrated our bit-packing decode implementation into parquet-mr, test
parquet batch reader ability from Spark VectorizedParquetRecordReader which get
parquet column data by batch way. We construct parquet file with different row
count and column count, the column data type is Int32, the maximum int value is
127 which satisfy bit pack encode with bit width=7, the count of row is from
10k to 100 million and the count of column is from 1 to 4.
[cid:[email protected]][cid:[email protected]][cid:[email protected]]