[
https://issues.apache.org/jira/browse/MAPREDUCE-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927523#action_12927523
]
Scott Chen commented on MAPREDUCE-2125:
---------------------------------------
{quote}
The problem is not small jobs but short tasks in jobs with large amount of
tasks. We happened to have certain system that generates jobs with 50k to 100k
tasks per job, that only have a few MB per split, if you have multiple such
jobs in different queues (or any shared scheduler that's not strictly FIFO),
you can have high job completion rate for these large jobs after a while.
Arguably, these jobs can be optimized to use proper input format to use less
splits (hence less tasks) but I'd like to point out that such work load exists.
{quote}
I see. Now I understand why you are worried.
{quote}
Another issue with the patch, the metrics names are regenerated on every
update, which is wasteful. For these system counters you can use a simple cache
to generate these metrics names only once and produce no additional garbage in
updates.
{quote}
That's a good observation. We can intern those strings to avoid additional
garbage.
> Put map-reduce framework counters to JobTrackerMetricsInst
> ----------------------------------------------------------
>
> Key: MAPREDUCE-2125
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2125
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: jobtracker
> Affects Versions: 0.22.0
> Reporter: Scott Chen
> Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-2125.txt
>
>
> We have lots of useful information in the framework counters including
> #spills, filesystem read and write.
> It will be nice to put them all in the jobtracker metrics to get a global
> view of all these numbers.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.