iemejia opened a new pull request, #3570: URL: https://github.com/apache/parquet-java/pull/3570
Part of #3530 — Apache Parquet Java Performance Improvements ## Summary Bypass the Hadoop `CompressionCodec` abstraction for all six supported codecs, eliminating per-page codec-pool lookups, stream-wrapper allocation, and unnecessary buffer copies in both `CodecFactory` and `DirectCodecFactory`. | Codec | Before | After | |-------|--------|-------| | **Snappy** | Hadoop `SnappyCodec` stream wrappers | xerial `Snappy.compress`/`uncompress` direct calls | | **LZ4_RAW** | Hadoop codec abstraction | airlift `LZ4Compressor`/`LZ4Decompressor` direct | | **ZSTD** | Streaming `ZstdOutputStreamNoFinalizer`/`ZstdInputStreamNoFinalizer` | Reusable `ZstdCompressCtx`/`ZstdDecompressCtx` single-call APIs | | **GZIP** | Hadoop `GzipCodec` with codec-pool overhead | JDK `GZIPOutputStream`/`GZIPInputStream` direct | | **LZO** | GPL `com.hadoop.compression.lzo.LzoCodec` | aircompressor `LzoHadoopStreams` (Apache 2.0, wire-compatible) | | **Brotli** | Abandoned `brotli-codec` (jbrotli, 2016, x86-only) | `brotli4j` 1.23.0 (10 platforms incl. aarch64, reflection-loaded) | Notable side effects: - **LZO**: Removes GPL dependency; uses Apache 2.0 aircompressor. Wire-compatible framing. - **Brotli**: Enables aarch64 support (linux, macOS, Windows). Removes non-aarch64 Maven profile guards and test skips. JMH benchmarks: `CompressionBenchmark`, `CpuReadBenchmark`, `CpuWriteBenchmark`, `FileReadBenchmark`, `FileWriteBenchmark`, `ConcurrentReadWriteBenchmark`. ## Benchmark results **Environment**: JDK 25.0.3 (Temurin), OpenJDK 64-Bit Server VM, JMH 1.37, Linux x86_64. **End-to-end file write** (100K rows, SingleShotTime, ms/op lower is better): | Codec | V1 dict=true | V2 dict=true | V2 Speedup | |---|---|---|---:| | SNAPPY | 50.6 -> 40.9 (1.24x) | 69.7 -> 38.7 | **1.80x** | | ZSTD | 52.3 -> 43.6 (1.20x) | 70.7 -> 40.6 | **1.74x** | | LZ4_RAW | 49.6 -> 41.3 (1.20x) | 70.2 -> 39.0 | **1.80x** | | GZIP | 149.9 -> 119.3 (1.26x) | 123.4 -> 67.6 | **1.83x** | | BROTLI | 55.4 -> 46.8 (1.18x) | 72.8 -> 41.8 | **1.74x** | **End-to-end file read** (ms/op lower is better): | Codec | V1 Speedup | V2 Speedup | |---|---:|---:| | SNAPPY | **1.50x** | **1.61x** | | ZSTD | **1.49x** | **1.60x** | | LZ4_RAW | **1.23x** | **1.57x** | | GZIP | **1.47x** | **1.49x** | | BROTLI | **1.83x** | **1.91x** | **Raw codec throughput** (`DirectCodecFactory`): Snappy/ZSTD/LZ4/GZIP unchanged (already had native access). Brotli decompression improved **2.3-2.7x** (brotli4j >> jbrotli). V2 shows consistently larger speedups than V1 because V2 encoding produces more, smaller pages, meaning more codec invocations per file where the per-invocation Hadoop overhead accumulates. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
