NEUpanning opened a new issue, #10104: URL: https://github.com/apache/incubator-gluten/issues/10104
### Backend VL (Velox) ### Bug description The total time across all tasks for vanilla Spark was 78.1 hours, but for gluten it reached **1899.3** hours. The flame graph shows that the majority of time is occupied by the merging of payloads. After adding some logs, I see merge operation occurred 1522330 times for 1084128 rows in a task, with each instance taking x milliseconds. Flame graph: <img width="1202" alt="Image" src="https://github.com/user-attachments/assets/3ec70fb5-52c5-45c3-afc9-163fdd36a026" /> gluten shuffle metrics: ``` shuffle records written: 39,407,858,231 shuffle write time total (min, med, max (stageId: taskId)) 17.86 h (0 ms, 27.4 s, 17.1 m (stage 0.0: task 499)) time to compress total (min, med, max (stageId: taskId)) 31.52 h (0 ms, 29.6 s, 20.3 m (stage 0.0: task 1191)) time to split total (min, med, max (stageId: taskId)) 1478.09 h (0 ms, 45.5 m, 1.71 h (stage 0.0: task 86)) time to spill total (min, med, max (stageId: taskId)) 15.77 h (0 ms, 25.2 s, 16.3 m (stage 0.0: task 499)) ``` shuffle schema: <img width="203" alt="Image" src="https://github.com/user-attachments/assets/c3466aee-b5d3-46b0-80db-5b208ac393eb" /> ### Gluten version Gluten-1.3 ### Spark version Spark-3.5.x ### Spark configurations _No response_ ### System information _No response_ ### Relevant logs ```bash ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
