Hi All We have implemented MR & Spark running job performance monitoring(JPM) in the past few weeks. In MR/Spark JPMs, we fetch running job list from the yarn resource manager, get job details from rest api that supported by yarn and save the parsed results to hbase. Later, we can use these job information to analysis performance and generate alerts. To implement these functionalities in a streaming way, We build storm topologies for each JPM. We set one spout to fetch running job list from yarn and recover from zookeeper when restarted, and set some bolts to handle each job.
For more details, please view https://github.com/apache/incubator-eagle/pull/309 Thanks Jinhu
