[
https://issues.apache.org/jira/browse/MAPREDUCE-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927107#action_12927107
]
Scott Chen commented on MAPREDUCE-2125:
---------------------------------------
Hey Luke, Thanks for the comments :)
bq. # Have you tried to benchmark the patch at scale? Calling job.getCounters
in completeJob would bring down a busy JT on a large cluster to its knee. Think
about calling getCounters (which is essentially a O( n ) operation) a few
hundred times per second!
You are right about that getCounters(). The method is really expansive.
But here we do this in JobTrackerMetricsInst.doUpdates() which is called only
every 5 seconds. So it has very minor impact on JT performance.
We have put this on our 3000 nodes cluster that has many big jobs for months
and it has been running fine.
bq. The necessity of having these total aggregate counts in real time. Rumen or
other MR log processing tools can get these aggregates for performance analysis
without impacting JT performance.
We have an internal tool that graphs these metrics on a dashboard. It is really
useful in real-time debugging for the cluster issues. I believe Y! and other
people also have similar use case.
bq. If you really want these counters in real time, you should implement it in
TT where it can send the metrics to distributed metrics aggregators with UDP
etc. and can be easily disabled/enabled via the metrics system.
That sounds like a good solution too. But I like the current way better because
it is very simple.
Anything we add in JobCounter and TaskCounter will automatically go to the
metrics. We don't need to add more codes to make that happen.
> Put map-reduce framework counters to JobTrackerMetricsInst
> ----------------------------------------------------------
>
> Key: MAPREDUCE-2125
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2125
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: jobtracker
> Affects Versions: 0.22.0
> Reporter: Scott Chen
> Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-2125.txt
>
>
> We have lots of useful information in the framework counters including
> #spills, filesystem read and write.
> It will be nice to put them all in the jobtracker metrics to get a global
> view of all these numbers.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.