This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/main by this push:
new 66375ec4a ORC-2082: Support Parquet LZ4 in bench module
66375ec4a is described below
commit 66375ec4a1dc42e9afc952037908417d33f82721
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Fri Feb 6 17:46:38 2026 -0800
ORC-2082: Support Parquet LZ4 in bench module
### What changes were proposed in this pull request?
This PR aims to support Parquet LZ4 in bench module.
### Why are the changes needed?
To benchmark `LZ4` like the other codecs.
### How was this patch tested?
Manually run the following.
**BUILD**
```
$ cd java
$ mvn package -DskipTests -Pbenchmark
```
**WRITE**
```
$ java -jar core/target/orc-benchmarks-core-*-uber.jar generate data -d
sales -c lz4 -f parquet
Processing sales [parquet]
[main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
[main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new
compressor [.lz4]
```
**FILE NAME**
```
$ ls -alR data/generated/sales
total 13396024
drwxr-xr-x 4 dongjoon staff 128 Feb 6 16:51 .
drwxr-xr-x 3 dongjoon staff 96 Feb 6 14:50 ..
-rw-r--r-- 1 dongjoon staff 3768120878 Feb 6 16:53 parquet.lz4
```
**READ**
```
$ java -jar core/target/orc-benchmarks-core-*-uber.jar scan data -d sales
-c lz4 -f parquet
...
[main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block
read in memory in 10 ms. row count = 374588
data/generated/sales/parquet.lz4 rows: 25000000 batches: 24415
```
**PARQUET**
```
$ parquet meta data/generated/sales/parquet.lz4 | head -n3
File path: data/generated/sales/parquet.lz4
Created by: parquet-mr version 1.17.0 (build
fac0c746532e133beb928a7f6a7e57b510b477a1)
$ parquet footer data/generated/sales/parquet.lz4 | grep -i LZ | sort | uniq
"codec" : "LZ4_RAW",
```
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: `Opus 4.5` on `Claude Code`
Closes #2521 from dongjoon-hyun/ORC-2082.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../java/org/apache/orc/bench/core/convert/parquet/ParquetWriter.java | 2 ++
1 file changed, 2 insertions(+)
diff --git
a/java/bench/core/src/java/org/apache/orc/bench/core/convert/parquet/ParquetWriter.java
b/java/bench/core/src/java/org/apache/orc/bench/core/convert/parquet/ParquetWriter.java
index 413ed21fb..7b73b9a31 100644
---
a/java/bench/core/src/java/org/apache/orc/bench/core/convert/parquet/ParquetWriter.java
+++
b/java/bench/core/src/java/org/apache/orc/bench/core/convert/parquet/ParquetWriter.java
@@ -51,6 +51,8 @@ public class ParquetWriter implements BatchWriter {
return CompressionCodecName.SNAPPY;
case ZSTD:
return CompressionCodecName.ZSTD;
+ case LZ4:
+ return CompressionCodecName.LZ4_RAW;
default:
throw new IllegalArgumentException("Unhandled compression type " +
kind);
}