NEUpanning opened a new issue, #10104:
URL: https://github.com/apache/incubator-gluten/issues/10104

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   The total time across all tasks for vanilla Spark was 78.1 hours, but for 
gluten it reached **1899.3** hours. The flame graph shows that the majority of 
time is occupied by the merging of payloads. After adding some logs, I see 
merge operation occurred 1522330 times for 1084128 rows in a task, with each 
instance taking x milliseconds.
   
   Flame graph:
   <img width="1202" alt="Image" 
src="https://github.com/user-attachments/assets/3ec70fb5-52c5-45c3-afc9-163fdd36a026";
 />
   
   gluten shuffle metrics:
   ```
   shuffle records written: 39,407,858,231
   shuffle write time total (min, med, max (stageId: taskId))
   17.86 h (0 ms, 27.4 s, 17.1 m (stage 0.0: task 499))
   time to compress total (min, med, max (stageId: taskId))
   31.52 h (0 ms, 29.6 s, 20.3 m (stage 0.0: task 1191))
   time to split total (min, med, max (stageId: taskId))
   1478.09 h (0 ms, 45.5 m, 1.71 h (stage 0.0: task 86))
   time to spill total (min, med, max (stageId: taskId))
   15.77 h (0 ms, 25.2 s, 16.3 m (stage 0.0: task 499))
   ```
   shuffle schema:
   
   <img width="203" alt="Image" 
src="https://github.com/user-attachments/assets/c3466aee-b5d3-46b0-80db-5b208ac393eb";
 />
   
   
   ### Gluten version
   
   Gluten-1.3
   
   ### Spark version
   
   Spark-3.5.x
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to