[ 
https://issues.apache.org/jira/browse/EAGLE-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated EAGLE-276:
--------------------------
    Description: 
We add the support for MR & Spark history job monitoring(JPM) for Apache Eagle 
which are used to analyze the performance of the history jobs and generate 
alerts. For now, they only contains data ingestion.

For MR JPM, it reads the finished job log files from hdfs, parses the log and 
configuration files and save the results to the backend storage. We use hbase 
now.

For Spark JPM, it fetches the finished job ids from the Resource manager,  asks 
the Spark history server for log file locations with the job ids, parses the 
log files and save the results to the backend storage which is hbase either.

To meet these requirements in a streaming way and achieve higher availability, 
both MR and Spark JPM use the storm topology. The spout reads MR history file 
logs or fetches Spark finished job ids from the Resource manager and the bolts 
handle the remaining logic.

We will add features about performance of history jobs and alerts later.

  was:
As administrator I want to monitor the spark job running in my cluster. Data of 
the follow jobs will be collected:

- Spark history jobs
- MR history jobs


> JPM - Spark&MR job history monitoring
> -------------------------------------
>
>                 Key: EAGLE-276
>                 URL: https://issues.apache.org/jira/browse/EAGLE-276
>             Project: Eagle
>          Issue Type: New Feature
>    Affects Versions: v0.4.0
>            Reporter: Jing Ge
>            Assignee: wujinhu
>              Labels: JPM
>
> We add the support for MR & Spark history job monitoring(JPM) for Apache 
> Eagle which are used to analyze the performance of the history jobs and 
> generate alerts. For now, they only contains data ingestion.
> For MR JPM, it reads the finished job log files from hdfs, parses the log and 
> configuration files and save the results to the backend storage. We use hbase 
> now.
> For Spark JPM, it fetches the finished job ids from the Resource manager,  
> asks the Spark history server for log file locations with the job ids, parses 
> the log files and save the results to the backend storage which is hbase 
> either.
> To meet these requirements in a streaming way and achieve higher 
> availability, both MR and Spark JPM use the storm topology. The spout reads 
> MR history file logs or fetches Spark finished job ids from the Resource 
> manager and the bolts handle the remaining logic.
> We will add features about performance of history jobs and alerts later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to