Vrushali C commented on MAPREDUCE-7065:

I believe it should be okay to change the field in ATSv2 since it is still in 
alpha2 mode as of hadoop 3.0, hadoop-2.9 as per the latest YARN-5355 commit. 

But other than ATSv2, there may be other downstream applications that may be 
depending on the entity names. Also, I am not sure if ATSv1 requires it to stay 
the same, for instance. 

For point (2) Task attempt final state, I think, in general we can consider 
storing some "important" fields as special columns in addition to their regular 
storage place as well. This will make querying data easier. As long as those 
fields are not huge, it should be okay to store them as additional columns. 

I am not sure I completely understand (3). Counters are stored as metrics, so 
do we need them additionally in info column family? 

I am reading through the rest of the jira, so will get back with some detailed 
response soon. 

> Improve information stored in ATSv2 for MR jobs
> -----------------------------------------------
>                 Key: MAPREDUCE-7065
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7065
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>            Priority: Major
> While exploring the possibility of retrieving every piece of information that 
> JHS presents today through ATSv2, I found a few improvements we can make.
> 1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are 
> indistinguishably stored as entities of type MR_TASK. We can split MR_TASK 
> 2) Task attempt final state are stored in the events, so we can not use 
> infofilter to group task attempts by final state, which is what JHS does.
> 3) Display names of counters are not stored in JHS. We are currently storing 
> (counter name, display name, value) as a metric (counter name, value). We can 
> potentially store (counter name, display name) as an info. Similarly for 
> sources of Job configuration properties
> 4) Job level counters and configuration properties are stored both in 
> ApplicationTable and EntityTable. It's probably safe just to store MR 
> specific counters in EntityTable.
> One general problem I see around this area in MR:
> 1) We can precompute # of failed/killed/successful map/reduce task attempts 
> and average map/reduce/shuffle/merge time in the AM. This would avoid 
> iterating over all task attempts when JHS servers the Job Overview Page.
> To fully replace JHS with ATSv2, three functionalities need to be supported 
> by ATSv2
> 1) /apps/ query so that a list of all jobs can be retrieved (YARN-6058)
> 2) support streaming api to get all generic entities (YARN-5627)
> 3) support per-app data retention policy. Likely a setting in TimelineWriter 
> that allow admins specifies how long information of a given application 
> should be kepts, in the form of TTL in HBase.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to