merrily01 opened a new issue, #772:
URL: https://github.com/apache/auron/issues/772
**Describe the bug**
Blaze does not support the LZO compression format for Parquet tables.
**To Reproduce**
Steps to reproduce the behavior:
1. Create an LZO-compressed Parquet table:
```
CREATE TABLE IF NOT EXISTS people_35 (id INT, name STRING, age INT) USING
parquet OPTIONS (compression 'lzo');
```
2. Insert data into the test table:
```
INSERT INTO people_35 VALUES (1, 'Alice', 25), (2, 'Bob', 30), (3,
'Charlie', 35), (4, 'David', 40), (5, 'Eve', 45);
```
3. Query the test table with Blaze enabled/disabled:
```
set spark.blaze.enable=true/false;
> select * from people_35;
```
When attempting to query the LZO-compressed Parquet table with Blaze
enabled, the following error message is encountered:
```
spark-sql (blaze_db)> 25/01/17 17:17:02 WARN TaskSetManager: Lost task 3.3
in stage 0.0 (TID 18) : TaskKilled (Stage cancelled: Job aborted due to stage
failure: Task 4 in stage 0.0 failed 4 times, most recent failure: Lost task 4.3
in stage 0.0 (TID 17) (executor 4): java.lang.RuntimeException: poll record
batch error: Execution error: native execution panics: Execution error:
Execution error: output_with_sender[ParquetScan] error: Execution error:
output_with_sender[ParquetScan]: output() returns error: Arrow error: External
error: NYI: The codec type LZO is not supported yet
at org.apache.spark.sql.blaze.JniBridge.nextBatch(Native Method)
at
org.apache.spark.sql.blaze.BlazeCallNativeWrapper$$anon$1.hasNext(BlazeCallNativeWrapper.scala:80)
at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:95)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:143)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:662)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:682)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
```
**Screenshots**
<img width="1366" alt="Image"
src="https://github.com/user-attachments/assets/159adb2f-d52f-4840-a6d9-64128940e7a0"
/>
**Additional context**
Spark Version: 3.5
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]