[
https://issues.apache.org/jira/browse/HADOOP-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822816#comment-13822816
]
Chris Nauroth commented on HADOOP-10090:
----------------------------------------
Here is a thought regarding the system source issue and reintroducing
synchronization around {{MetricsSource#getMetrics}} calls.
My understanding of the HADOOP-8050 deadlock is that we had a lock ordering
conflict between a JMX thread (locking {{MetricsSourceAdapter}} and then
{{MetricsSystemImpl}}) and a snapshotting thread (locking {{MetricsSystemImpl}}
and then {{MetricsSourceAdapter}}). HADOOP-8050 resolved the deadlock by
releasing the lock on the {{MetricsSourceAdapter}} before calling
{{MetricsSource#getMetrics}}.
What if instead we do the following:
# Change {{MetricsSourceAdapter#getMetrics}} as follows:
{code}
Iterable<MetricsRecordImpl> getMetrics(MetricsBuilderImpl builder,
boolean all) {
synchronized (source) {
synchronized (this) {
// existing method logic here
}
}
}
{code}
# Change {{MetricsSystemImpl}} so that it implements {{MetricsSource}} directly
instead of using an anonymous inner class.
The first part synchronizes {{getMetrics}} calls using a locking order that's
consistent with the snapshotting threads. The second part is required so that
the first part's synchronization on the source is really synchronizing on the
{{MetricsSystemImpl}} instance instead of the separate anonymous inner class
instance.
> Jobtracker metrics not updated properly after execution of a mapreduce job
> --------------------------------------------------------------------------
>
> Key: HADOOP-10090
> URL: https://issues.apache.org/jira/browse/HADOOP-10090
> Project: Hadoop Common
> Issue Type: Bug
> Components: metrics
> Affects Versions: 1.2.1
> Reporter: Ivan Mitic
> Assignee: Ivan Mitic
> Attachments: HADOOP-10090.branch-1.patch, OneBoxRepro.png
>
>
> After executing a wordcount mapreduce sample job, jobtracker metrics are not
> updated properly. Often times the response from the jobtracker has higher
> number of job_completed than job_submitted (for example 8 jobs completed and
> 7 jobs submitted).
> Issue reported by Toma Paunovic.
--
This message was sent by Atlassian JIRA
(v6.1#6144)