lifulong opened a new issue, #10436:
URL: https://github.com/apache/incubator-gluten/issues/10436

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   "E20250814 14:12:01.533819 3578786 Exceptions.h:70] Line: 
/home/lifulong/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/Task.cpp:2105,
 Function:terminate, Expression:  Cancelled, Source: RUNTIME, ErrorCode: 
INVALID_STATE
   I20250814 14:12:01.533937 3578786 Task.cpp:2117] Terminating task 
Gluten_Stage_12_TID_123112_VTID_3 with state Canceled after running for 3m 48s"
   
   The two line above is error msg in spark executor log, no more error info 
find, i have try add some log while memory arbitrator or wait timeout to locate 
root cause, but has no results, anyone has idea for further troubleshoot the 
issue.
   
   below is more information for run test:
   
   spark sql always fail while 1.5G offheap per core
   spark sql may fail with a certain probability while 3G offheap per core, 
increase spark.task.maxFailures and spark.yarn.max.executor.failures config, 
sql job always success.
   
   the sql run with spark.gluten.sql.columnar.forceShuffledHashJoin=false 
config, to use sort merge join, and will spill 500M data to disk per task.
   
    
   
   ### Gluten version
   
   Gluten-1.4
   
   ### Spark version
   
   Spark-3.5.x
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   
   
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to