[I] [EPIC] Improve cluster observability [datafusion-ballista]

via GitHub Thu, 29 Jan 2026 06:27:52 -0800


milenkovicm opened a new issue, #1426:
URL: https://github.com/apache/datafusion-ballista/issues/1426


   Two major directions (as of now): 
   
   ## Collect Executor Statistics
   
   As starting point we could start by collecting and aggregating executor 
statistic, something similar to [spark UI executor 
tab](https://spark.apache.org/docs/3.5.7/web-ui.html#executors-tab)
   
   ![](https://spark.apache.org/docs/3.5.7/img/webui-exe-tab.png)
   
   some statistics, such as memory utilisation would be collected on the 
executor, other like shuffle read and write may be collected on scheduler side. 
   
   We would need to expose additional rest interface to expose collected 
metrics 
   
   ## Per Stage Flame Graph
   
   Similar to [Nvidia RAPIDS Per Stage Flame 
Graph](https://nvidia.github.io/spark-rapids/docs/additional-functionality/per-stage-flamegraph.html)
 collect stats and produce flame graphs. 
   
   We would need to further investigate what should be done 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [EPIC] Improve cluster observability [datafusion-ballista]

Reply via email to