[
https://issues.apache.org/jira/browse/AMBARI-9368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Greg Senia updated AMBARI-9368:
-------------------------------
Attachment: monitor_lock-3-pid10099.txt
monitor_lock-2-pid10099.txt
monitor_lock-1-pid10099.txt
I have been seeing this issue in our environment with our production 1.6.1
ambari-server over the last few days due to mass adding of new nodes/hardware
and our automation component making API calls back to install/start the
components on these new nodes. This is how I found AMBARI-9334.
After the hangs this week and last week I've confirmed its always in the same
as you report above Jon. Also I've been grabbing thread dumps and the IBM JCA
tool does not report these as a deadlock. I think it's up to the JDK runtime to
determine if there is really a deadlock occurring from my experience with
Websphere and debugging apps.
I've attached my output from IBM JCA after running some thread dumps on the
hung Ambari-server today.. Definitely seems perplexing as to why there are
double locks.. Any docy on why it was done?
> Deadlock Between Dependent Cluster/Service/Component/Host Implementations
> -------------------------------------------------------------------------
>
> Key: AMBARI-9368
> URL: https://issues.apache.org/jira/browse/AMBARI-9368
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 1.6.1
> Reporter: Jonathan Hurley
> Assignee: Jonathan Hurley
> Priority: Critical
> Fix For: 2.0.0
>
> Attachments: jstack.29096, monitor_lock-1-pid10099.txt,
> monitor_lock-2-pid10099.txt, monitor_lock-3-pid10099.txt
>
>
> Looks like a textbook deadlock. Why jstack doesn't report it, I don't know.
> Call Hierarchy
> {code}
> qtp572501352-104
> ServiceComponentImpl.convertToResponse readWriteLock.readLock().lock()
> ACQUIRED
> ServiceComponentHostImpl.getState() readLock.lock() BLOCKED
>
> qtp572501352-34
> ServiceComponentHostImpl.persist() writeLock.lock() ACQUIRED
> ServiceComponentImpl.refresh() readWriteLock.writeLock() BLOCKED
> {code}
>
> Deadlock Order
> {code}
> qtp572501352-104
> ServiceComponentImpl.convertToResponse readWriteLock.readLock().lock()
> ACQUIRED
> qtp572501352-34
> ServiceComponentHostImpl.persist() writeLock.lock() ACQUIRED
> ServiceComponentImpl.refresh() readWriteLock.writeLock() BLOCKED
> qtp572501352-104
> ServiceComponentHostImpl.getState() readLock.lock() BLOCKED
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)