[I] Support LZO Compression for Parquet Tables in Blaze [auron]

via GitHub Mon, 12 Jan 2026 23:13:53 -0800


merrily01 opened a new issue, #772:
URL: https://github.com/apache/auron/issues/772


   **Describe the bug**
   
   Blaze does not support the LZO compression format for Parquet tables.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   1. Create an LZO-compressed Parquet table:
   ```
   CREATE TABLE IF NOT EXISTS people_35 (id INT, name STRING, age INT) USING 
parquet OPTIONS (compression 'lzo');
   ```
   2. Insert data into the test table:
   ```
   INSERT INTO people_35 VALUES (1, 'Alice', 25), (2, 'Bob', 30), (3, 
'Charlie', 35), (4, 'David', 40), (5, 'Eve', 45);
   ```
   3. Query the test table with Blaze enabled/disabled:
   ```
   set spark.blaze.enable=true/false;
   > select * from people_35;
   ```
   When attempting to query the LZO-compressed Parquet table with Blaze 
enabled, the following error message is encountered:
   ```
   spark-sql (blaze_db)> 25/01/17 17:17:02 WARN TaskSetManager: Lost task 3.3 
in stage 0.0 (TID 18) : TaskKilled (Stage cancelled: Job aborted due to stage 
failure: Task 4 in stage 0.0 failed 4 times, most recent failure: Lost task 4.3 
in stage 0.0 (TID 17) (executor 4): java.lang.RuntimeException: poll record 
batch error: Execution error: native execution panics: Execution error: 
Execution error: output_with_sender[ParquetScan] error: Execution error: 
output_with_sender[ParquetScan]: output() returns error: Arrow error: External 
error: NYI: The codec type LZO is not supported yet
        at org.apache.spark.sql.blaze.JniBridge.nextBatch(Native Method)
        at 
org.apache.spark.sql.blaze.BlazeCallNativeWrapper$$anon$1.hasNext(BlazeCallNativeWrapper.scala:80)
        at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:95)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at org.apache.spark.scheduler.Task.run(Task.scala:143)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:662)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:682)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
   ```
   
   **Screenshots**
   
   <img width="1366" alt="Image" 
src="https://github.com/user-attachments/assets/159adb2f-d52f-4840-a6d9-64128940e7a0";
 />
   
   **Additional context**
   Spark Version: 3.5
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Support LZO Compression for Parquet Tables in Blaze [auron]

Reply via email to