[
https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206689#comment-13206689
]
Kihwal Lee commented on HADOOP-8050:
------------------------------------
bq. The correct fix (sans moving jmx to a sink) is not removing the lock on
metrics system in the snapshot thread but fixing the lock order in
MetricsSourceAdapter (to make source.getMetrics is done without holding the
adapter lock).
I tried to do this in the new patch. Since updateJmxCache() doesn't block while
calling getMetrics(), some may not get the latest metric data if
updateJmxCache() is already being executed by another thread.
The patch passes all metrics related tests.
> Deadlock in metrics
> -------------------
>
> Key: HADOOP-8050
> URL: https://issues.apache.org/jira/browse/HADOOP-8050
> Project: Hadoop Common
> Issue Type: Bug
> Components: metrics
> Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Fix For: 1.1.0, 1.0.1
>
> Attachments: hadoop-8050-branch-1.patch.txt,
> hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt,
> hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt,
> hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC
> works but the web ui and hftp stop working. I haven't look at the trunk too
> closely, but it might happen there too.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira