-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31341/#review73795
-----------------------------------------------------------

Ship it!


Ship It!

- Tom Beerbower


On Feb. 24, 2015, 5:36 a.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31341/
> -----------------------------------------------------------
> 
> (Updated Feb. 24, 2015, 5:36 a.m.)
> 
> 
> Review request for Ambari, Nate Cole and Tom Beerbower.
> 
> 
> Bugs: AMBARI-9761
>     https://issues.apache.org/jira/browse/AMBARI-9761
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Another case of misunderstanding how locks work.
> 
> During provisioning of a cluster with at least 200 hosts, Ambari Server 
> becomes unresponsive. Based on the thread dump, there exists a deadlock 
> between:
> - Cluster readers
> - Cluster writers
> - ServiceComponentHost writers
> 
> qtp626652285-97   ClusterImpl.convertToResponse() (cluster readLock)
> qtp1282624353-47  ServiceComponentHostImpl.setRestartRequired() (sch 
> writeLock)
> qtp626652285-97   ServiceComponentHostImpl.getMaintenanceState() (sch 
> readLock BLOCKED by qtp1282624353-47)
> qtp1282624353-60  ClusterImpl.recalculateClusterVersionState() (cluster 
> writeLock BLOCKED by qtp626652285-97)
> qtp1282624353-47  ServiceComponentHostImpl.isPersisted() (cluster readLock 
> BLOCKED by qtp1282624353-60)
> 
> The underlying problem is that a writeLock.lock() is parked which causes all 
> subsequent readLock.lock() requests to also park. This includes the request 
> from qtp1282624353-47 which is holding a writeLock on the SCH which, in turn, 
> is blocking qtp626652285-97 (the original cluster readLock reader which 
> blocks the cluster write)
> 
> Long story short is that I think we need to revisit locks again after 2.0.0; 
> I just don't see a need for locking on reads in most places - that's what the 
> database is doing for us.
> 
> 
> Diffs
> -----
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/events/listeners/upgrade/StackVersionListener.java
>  117526c 
>   ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java 
> 0de62ea 
>   
> ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java
>  c43044c 
>   
> ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterDeadlockTest.java
>  96a1443 
> 
> Diff: https://reviews.apache.org/r/31341/diff/
> 
> 
> Testing
> -------
> 
> Reproduced the deadlock in a unit test first, and then verified the deadlock 
> does not occur anymore in the test after applying the patch.
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>

Reply via email to