> On Feb. 24, 2015, 7:23 a.m., Nate Cole wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java,
> >  lines 413-417
> > <https://reviews.apache.org/r/31341/diff/1/?file=873469#file873469line413>
> >
> >     Do we even need a read lock around the simple boolean check?  There is 
> > a write lock around the persist() method which seems like the only place 
> > where it's required.

Thanks for the review! I was debating the same issue. I've left the locks in 
place that involve non-thread-safe members. In this case though, you're right 
that the `persist()` method is enough. Furthermore, this member is only ever 
set in the event that the service is brand new.

Overall, we really need to remove the locks from many of the read methods. I'm 
seriously debating removing the read locks around `convertToResponse()` but 
felt that this close to release, that's risky.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31341/#review73801
-----------------------------------------------------------


On Feb. 24, 2015, 12:36 a.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31341/
> -----------------------------------------------------------
> 
> (Updated Feb. 24, 2015, 12:36 a.m.)
> 
> 
> Review request for Ambari, Nate Cole and Tom Beerbower.
> 
> 
> Bugs: AMBARI-9761
>     https://issues.apache.org/jira/browse/AMBARI-9761
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Another case of misunderstanding how locks work.
> 
> During provisioning of a cluster with at least 200 hosts, Ambari Server 
> becomes unresponsive. Based on the thread dump, there exists a deadlock 
> between:
> - Cluster readers
> - Cluster writers
> - ServiceComponentHost writers
> 
> qtp626652285-97   ClusterImpl.convertToResponse() (cluster readLock)
> qtp1282624353-47  ServiceComponentHostImpl.setRestartRequired() (sch 
> writeLock)
> qtp626652285-97   ServiceComponentHostImpl.getMaintenanceState() (sch 
> readLock BLOCKED by qtp1282624353-47)
> qtp1282624353-60  ClusterImpl.recalculateClusterVersionState() (cluster 
> writeLock BLOCKED by qtp626652285-97)
> qtp1282624353-47  ServiceComponentHostImpl.isPersisted() (cluster readLock 
> BLOCKED by qtp1282624353-60)
> 
> The underlying problem is that a writeLock.lock() is parked which causes all 
> subsequent readLock.lock() requests to also park. This includes the request 
> from qtp1282624353-47 which is holding a writeLock on the SCH which, in turn, 
> is blocking qtp626652285-97 (the original cluster readLock reader which 
> blocks the cluster write)
> 
> Long story short is that I think we need to revisit locks again after 2.0.0; 
> I just don't see a need for locking on reads in most places - that's what the 
> database is doing for us.
> 
> 
> Diffs
> -----
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/events/listeners/upgrade/StackVersionListener.java
>  117526c 
>   ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java 
> 0de62ea 
>   
> ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java
>  c43044c 
>   
> ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterDeadlockTest.java
>  96a1443 
> 
> Diff: https://reviews.apache.org/r/31341/diff/
> 
> 
> Testing
> -------
> 
> Reproduced the deadlock in a unit test first, and then verified the deadlock 
> does not occur anymore in the test after applying the patch.
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>

Reply via email to