wenwj0 opened a new issue, #6897:
URL: https://github.com/apache/incubator-gluten/issues/6897

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   Enviroment
   - debian-10.13
   - Spark-3.2.2
   - gluten-1.2.0 rc2
   - jdk: openjdk-8
   
   When execute the sql below,  the job will fail and throw message `free(): 
invalid pointer`
   ```sql
   select count(udid) from table  where ds = '20240731';
   ```
   
   The gluten plan is :
   ```
   == Fallback Summary ==
   (1) Scan hive xxxtable: Unsupported file format for UnknownFormat.
   
   == Physical Plan ==
   VeloxColumnarToRowExec (7)
   +- ^ ProjectExecTransformer (5)
      +- ^ InputIteratorTransformer (4)
         +- RowToVeloxColumnar (2)
            +- Scan hive xxxtable (1)
   ```
   
   The table fileformat is  SequenceFileInputFormat
   ```
   # Storage Information
   SerDe Library:       org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
   InputFormat:         org.apache.hadoop.mapred.SequenceFileInputFormat
   OutputFormat:        
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
   ```
   
   The error message is:
   
   ```
   [2024-08-16 17:58:22.422]Container exited with a non-zero exit code 134. 
Error file: prelaunch.err.
   Last 4096 bytes of prelaunch.err :
   /bin/bash:行 1: 23131 已放弃               /home/hadoop/java-8-sun/bin/java 
-server -Xmx4096m 
   Last 4096 bytes of stderr :
   
    [spark.gluten.sql.session.timeZone.default, Asia/Hong_Kong]
    [spark.sql.caseSensitive, false]
   I20240816 17:58:22.259377 82482 FileSystems.cpp:161] LocalFileSystem::mkdir 
/disk4/yarn/local/usercache/hadoop/appcache/xxx
   I20240816 17:58:22.321003 82482 Task.cpp:1136] All drivers (1) finished for 
task Gluten_Stage_3_TID_12_VTID_0 after running for 51 ms.
   I20240816 17:58:22.321048 82482 Task.cpp:1876] Terminating task 
Gluten_Stage_3_TID_12_VTID_0 with state Finished after running for 51 ms.
   I20240816 17:58:22.321477 82482 WholeStageResultIterator.cc:319] Native Plan 
with stats for: [Stage: 3 TID: 12]
   -- Aggregation[1][PARTIAL n1_0 := count_partial("n0_0")] -> n1_0:BIGINT
      Output: 1 rows (32B, 1 batches), Cpu time: 133.99us, Blocked wall time: 
0ns, Peak memory: 64.25KB, Memory allocations: 3, Threads: 1
         hashtable.capacity           sum: 0, count: 1, min: 0, max: 0
         hashtable.numDistinct        sum: 0, count: 1, min: 0, max: 0
         hashtable.numRehashes        sum: 0, count: 1, min: 0, max: 0
         hashtable.numTombstones      sum: 0, count: 1, min: 0, max: 0
         queuedWallNanos              sum: 7.00us, count: 1, min: 7.00us, max: 
7.00us
         runningAddInputWallNanos     sum: 99.58us, count: 1, min: 99.58us, 
max: 99.58us
         runningFinishWallNanos       sum: 0ns, count: 1, min: 0ns, max: 0ns
         runningGetOutputWallNanos    sum: 30.72us, count: 1, min: 30.72us, 
max: 30.72us
     -- ValueStream[0][] -> n0_0:VARCHAR
        Input: 0 rows (0B, 0 batches), Output: 1271 rows (71.81KB, 1 batches), 
Cpu time: 48.92ms, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 
0, Threads: 1
           runningAddInputWallNanos     sum: 0ns, count: 1, min: 0ns, max: 0ns
           runningFinishWallNanos       sum: 9.92us, count: 1, min: 9.92us, 
max: 9.92us
           runningGetOutputWallNanos    sum: 49.98ms, count: 1, min: 49.98ms, 
max: 49.98ms
   
   free(): invalid pointer
   .
   Driver stacktrace:
        at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2455)
        at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2404)
        at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2403)
        at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2403)
        at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1160)
        at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1160)
        at scala.Option.foreach(Option.scala:407)
        at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1160)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2643)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2585)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2574)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
   
        at 
org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:70)
        at 
org.apache.kyuubi.operation.ExecuteStatement.waitStatementComplete(ExecuteStatement.scala:135)
        at 
org.apache.kyuubi.operation.ExecuteStatement.$anonfun$runInternal$1(ExecuteStatement.scala:173)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748) (state=,code=0)
   ```
   
   We have set `spark.gluten.sql.debug=true`, but it seems that the only useful 
information is `free(): invalid pointer`.
   
   I guess there is something wrong is `RowToVeloxColumnar`.
   
   ### Spark version
   
   Spark-3.2.x
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to