Haibo Chen commented on MAPREDUCE-7065:

CC [~rohithsharma] [~vrushalic]. Is it OK at this point to change how MR data 
is stored in ATSv2, for example split MR_TASK into MR_MAP_TASK and 

> Improve information stored in ATSv2 for MR jobs
> -----------------------------------------------
>                 Key: MAPREDUCE-7065
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7065
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>            Priority: Major
> While exploring the possibility of retrieving every piece of information that 
> JHS presents today through ATSv2, I found a few improvements we can make.
> 1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are 
> indistinguishably stored as entities of type MR_TASK. We can split MR_TASK 
> 2) Task attempt final state are stored in the events, so we can not use 
> infofilter to group task attempts by final state, which is what JHS does.
> 3) Display names of counters are not stored in JHS. We are currently storing 
> (counter name, display name, value) as a metric (counter name, value). We can 
> potentially store (counter name, display name) as an info. Similarly for 
> sources of Job configuration properties
> 4) Job level counters and configuration properties are stored both in 
> ApplicationTable and EntityTable. It's probably safe just to store MR 
> specific counters in EntityTable.
> One general problem I see around this area in MR:
> 1) We can precompute # of failed/killed/successful map/reduce task attempts 
> and average map/reduce/shuffle/merge time in the AM. This would avoid 
> iterating over all task attempts when JHS servers the Job Overview Page.
> To fully replace JHS with ATSv2, three functionalities need to be supported 
> by ATSv2
> 1) /apps/ query so that a list of all jobs can be retrieved
> 2) support streaming api to get all generic entities (YARN-5627)
> 3) support per-app data retention policy. Likely a setting in TimelineWriter 
> that allow admins specifies how long information of a given application 
> should be kepts, in the form of TTL in HBase.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to