Hi, all We add the support for MR & Spark history job monitoring(JPM) for Apache Eagle which are used to analyze the performance of the history jobs and generate alerts. For now, they only contains data ingestion.
For MR JPM, it reads the finished job log files from hdfs, parses the log and configuration files and save the results to the backend storage. We use hbase now. For Spark JPM, it fetches the finished job ids from the Resource manager, asks the Spark history server for log file locations with the job ids, parses the log files and save the results to the backend storage which is hbase either. To meet these requirements in a streaming way and achieve higher availability, both MR and Spark JPM use the storm topology. The spout reads MR history file logs or fetches Spark finished job ids from the Resource manager and the bolts handle the remaining logic. We will add features about performance of history jobs and alerts later. For more details, please view the url https://issues.apache.org/jira/browse/EAGLE-276 Thanks, Jinhu Wu
