[ 
https://issues.apache.org/jira/browse/AMBARI-9368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Senia updated AMBARI-9368:
-------------------------------
    Attachment: monitor_lock-3-pid10099.txt
                monitor_lock-2-pid10099.txt
                monitor_lock-1-pid10099.txt

I have been seeing this issue in our environment with our production 1.6.1 
ambari-server over the last few days due to mass adding of new nodes/hardware 
and our automation component making API calls back to install/start the 
components on these new nodes. This is how I found AMBARI-9334.

After the hangs this week and last week I've confirmed its always in the same 
as you report above Jon. Also I've been grabbing thread dumps and the IBM JCA 
tool does not report these as a deadlock. I think it's up to the JDK runtime to 
determine if there is really a deadlock occurring from my experience with 
Websphere and debugging apps.

I've attached my output from IBM JCA after running some thread dumps on the 
hung Ambari-server today.. Definitely seems perplexing as to why there are 
double locks.. Any docy on why it was done?


> Deadlock Between Dependent Cluster/Service/Component/Host Implementations
> -------------------------------------------------------------------------
>
>                 Key: AMBARI-9368
>                 URL: https://issues.apache.org/jira/browse/AMBARI-9368
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 1.6.1
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: jstack.29096, monitor_lock-1-pid10099.txt, 
> monitor_lock-2-pid10099.txt, monitor_lock-3-pid10099.txt
>
>
> Looks like a textbook deadlock. Why jstack doesn't report it, I don't know.
> Call Hierarchy
> {code}
> qtp572501352-104
>   ServiceComponentImpl.convertToResponse readWriteLock.readLock().lock() 
> ACQUIRED
>     ServiceComponentHostImpl.getState() readLock.lock() BLOCKED
>   
> qtp572501352-34
>   ServiceComponentHostImpl.persist() writeLock.lock() ACQUIRED
>     ServiceComponentImpl.refresh()  readWriteLock.writeLock() BLOCKED
> {code} 
>    
> Deadlock Order
> {code}
> qtp572501352-104
>   ServiceComponentImpl.convertToResponse readWriteLock.readLock().lock() 
> ACQUIRED
> qtp572501352-34
>   ServiceComponentHostImpl.persist() writeLock.lock() ACQUIRED
>   ServiceComponentImpl.refresh()  readWriteLock.writeLock() BLOCKED
> qtp572501352-104
>   ServiceComponentHostImpl.getState() readLock.lock() BLOCKED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to