Hi dev@, I’ve been working on performance improvements across the main encoding/decoding hot paths of Apache Parquet Java. I presented this work during last week’s Parquet community sync and I am sharing a summary here for broader visibility, in line with Apache best practices.
Using AI assisted tools and JMH, I expanded the existing coverage of microbenchmarks covering critical hot paths. I then iterated on a series of optimizations, validated for correctness, and reviewed with other AI tools. The results are promising. The improvements focus on eliminating per-value overhead in the hot loops without changing the file format or public API. Key changes: - Plain INT32/LONG: bulk System.arraycopy instead of per-value ByteBuffer.putInt (~4x encode, ~3x decode) - ByteStreamSplit: zero-allocation batch scatter/gather (3-5x encode, 2x decode) - Dictionary encoding: custom open-addressing hash map replacing java.util.HashMap (up to 80x for low-cardinality string columns) - RLE dictionary index decoder: direct ByteBuffer access bypassing InputStream - New batch read APIs: readIntegers()/readLongs() for vectorized consumers End-to-end file read/write throughput improves by ~13–14% on average across codecs in my test suite (Java 11, AMD EPYC). Full JMH results (303 benchmarks) and a more detailed write-up will follow. Most changes have been grouped and tracked under the following issue, which provides background and links to the related pull requests https://github.com/apache/parquet-java/issues/3530 The first set of pull requests is ready for review. Feedback and comments from Java committers would be greatly appreciated. Thanks, Ismaël ps. Kudos to Fokko Driesprong who already started reviewing some of them.
