This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/orc.git


The following commit(s) were added to refs/heads/main by this push:
     new 66375ec4a ORC-2082: Support Parquet LZ4 in bench module
66375ec4a is described below

commit 66375ec4a1dc42e9afc952037908417d33f82721
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Fri Feb 6 17:46:38 2026 -0800

    ORC-2082: Support Parquet LZ4 in bench module
    
    ### What changes were proposed in this pull request?
    
    This PR aims to support Parquet LZ4 in bench module.
    
    ### Why are the changes needed?
    
    To benchmark `LZ4` like the other codecs.
    
    ### How was this patch tested?
    
    Manually run the following.
    
    **BUILD**
    ```
    $ cd java
    
    $ mvn package -DskipTests -Pbenchmark
    ```
    
    **WRITE**
    ```
    $ java -jar core/target/orc-benchmarks-core-*-uber.jar generate data -d 
sales -c lz4 -f parquet
    Processing sales [parquet]
    [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
    [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new 
compressor [.lz4]
    ```
    
    **FILE NAME**
    ```
    $ ls -alR data/generated/sales
    total 13396024
    drwxr-xr-x 4 dongjoon  staff         128 Feb  6 16:51 .
    drwxr-xr-x 3 dongjoon  staff          96 Feb  6 14:50 ..
    -rw-r--r-- 1 dongjoon  staff  3768120878 Feb  6 16:53 parquet.lz4
    ```
    
    **READ**
    ```
    $ java -jar core/target/orc-benchmarks-core-*-uber.jar scan data -d sales 
-c lz4 -f parquet
    ...
    [main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block 
read in memory in 10 ms. row count = 374588
    data/generated/sales/parquet.lz4 rows: 25000000 batches: 24415
    ```
    
    **PARQUET**
    ```
    $ parquet meta data/generated/sales/parquet.lz4 | head -n3
    
    File path:  data/generated/sales/parquet.lz4
    Created by: parquet-mr version 1.17.0 (build 
fac0c746532e133beb928a7f6a7e57b510b477a1)
    
    $ parquet footer data/generated/sales/parquet.lz4 | grep -i LZ | sort | uniq
            "codec" : "LZ4_RAW",
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: `Opus 4.5` on `Claude Code`
    
    Closes #2521 from dongjoon-hyun/ORC-2082.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../java/org/apache/orc/bench/core/convert/parquet/ParquetWriter.java   | 2 ++
 1 file changed, 2 insertions(+)

diff --git 
a/java/bench/core/src/java/org/apache/orc/bench/core/convert/parquet/ParquetWriter.java
 
b/java/bench/core/src/java/org/apache/orc/bench/core/convert/parquet/ParquetWriter.java
index 413ed21fb..7b73b9a31 100644
--- 
a/java/bench/core/src/java/org/apache/orc/bench/core/convert/parquet/ParquetWriter.java
+++ 
b/java/bench/core/src/java/org/apache/orc/bench/core/convert/parquet/ParquetWriter.java
@@ -51,6 +51,8 @@ public class ParquetWriter implements BatchWriter {
         return CompressionCodecName.SNAPPY;
       case ZSTD:
         return CompressionCodecName.ZSTD;
+      case LZ4:
+        return CompressionCodecName.LZ4_RAW;
       default:
         throw new IllegalArgumentException("Unhandled compression type " + 
kind);
     }

Reply via email to