wenwj0 opened a new issue, #6897:
URL: https://github.com/apache/incubator-gluten/issues/6897
### Backend
VL (Velox)
### Bug description
Enviroment
- debian-10.13
- Spark-3.2.2
- gluten-1.2.0 rc2
- jdk: openjdk-8
When execute the sql below, the job will fail and throw message `free():
invalid pointer`
```sql
select count(udid) from table where ds = '20240731';
```
The gluten plan is :
```
== Fallback Summary ==
(1) Scan hive xxxtable: Unsupported file format for UnknownFormat.
== Physical Plan ==
VeloxColumnarToRowExec (7)
+- ^ ProjectExecTransformer (5)
+- ^ InputIteratorTransformer (4)
+- RowToVeloxColumnar (2)
+- Scan hive xxxtable (1)
```
The table fileformat is SequenceFileInputFormat
```
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat
OutputFormat:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
```
The error message is:
```
[2024-08-16 17:58:22.422]Container exited with a non-zero exit code 134.
Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/bin/bash:行 1: 23131 已放弃 /home/hadoop/java-8-sun/bin/java
-server -Xmx4096m
Last 4096 bytes of stderr :
[spark.gluten.sql.session.timeZone.default, Asia/Hong_Kong]
[spark.sql.caseSensitive, false]
I20240816 17:58:22.259377 82482 FileSystems.cpp:161] LocalFileSystem::mkdir
/disk4/yarn/local/usercache/hadoop/appcache/xxx
I20240816 17:58:22.321003 82482 Task.cpp:1136] All drivers (1) finished for
task Gluten_Stage_3_TID_12_VTID_0 after running for 51 ms.
I20240816 17:58:22.321048 82482 Task.cpp:1876] Terminating task
Gluten_Stage_3_TID_12_VTID_0 with state Finished after running for 51 ms.
I20240816 17:58:22.321477 82482 WholeStageResultIterator.cc:319] Native Plan
with stats for: [Stage: 3 TID: 12]
-- Aggregation[1][PARTIAL n1_0 := count_partial("n0_0")] -> n1_0:BIGINT
Output: 1 rows (32B, 1 batches), Cpu time: 133.99us, Blocked wall time:
0ns, Peak memory: 64.25KB, Memory allocations: 3, Threads: 1
hashtable.capacity sum: 0, count: 1, min: 0, max: 0
hashtable.numDistinct sum: 0, count: 1, min: 0, max: 0
hashtable.numRehashes sum: 0, count: 1, min: 0, max: 0
hashtable.numTombstones sum: 0, count: 1, min: 0, max: 0
queuedWallNanos sum: 7.00us, count: 1, min: 7.00us, max:
7.00us
runningAddInputWallNanos sum: 99.58us, count: 1, min: 99.58us,
max: 99.58us
runningFinishWallNanos sum: 0ns, count: 1, min: 0ns, max: 0ns
runningGetOutputWallNanos sum: 30.72us, count: 1, min: 30.72us,
max: 30.72us
-- ValueStream[0][] -> n0_0:VARCHAR
Input: 0 rows (0B, 0 batches), Output: 1271 rows (71.81KB, 1 batches),
Cpu time: 48.92ms, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations:
0, Threads: 1
runningAddInputWallNanos sum: 0ns, count: 1, min: 0ns, max: 0ns
runningFinishWallNanos sum: 9.92us, count: 1, min: 9.92us,
max: 9.92us
runningGetOutputWallNanos sum: 49.98ms, count: 1, min: 49.98ms,
max: 49.98ms
free(): invalid pointer
.
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2455)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2404)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2403)
at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2403)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1160)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1160)
at scala.Option.foreach(Option.scala:407)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1160)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2643)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2585)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2574)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at
org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:70)
at
org.apache.kyuubi.operation.ExecuteStatement.waitStatementComplete(ExecuteStatement.scala:135)
at
org.apache.kyuubi.operation.ExecuteStatement.$anonfun$runInternal$1(ExecuteStatement.scala:173)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) (state=,code=0)
```
We have set `spark.gluten.sql.debug=true`, but it seems that the only useful
information is `free(): invalid pointer`.
I guess there is something wrong is `RowToVeloxColumnar`.
### Spark version
Spark-3.2.x
### Spark configurations
_No response_
### System information
_No response_
### Relevant logs
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]