-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30357/#review70048
-----------------------------------------------------------

Ship it!


Wow, major clean up.  Nice!

- Tom Beerbower


On Jan. 28, 2015, 5:38 p.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30357/
> -----------------------------------------------------------
> 
> (Updated Jan. 28, 2015, 5:38 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Mahadev Konar, Nate Cole, and 
> Tom Beerbower.
> 
> 
> Bugs: AMBARI-9368
>     https://issues.apache.org/jira/browse/AMBARI-9368
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> A deadlock scenario was identified between ServiceComponentImpl and 
> ServiceComponentHostImpl. The root cause was the pattern of locking that is 
> used in ClusterImpl, ServiceImpl, ServiceComponentImpl and 
> ServiceComponentHostImpl. Essentially, our pattern of using 2 locks per 
> implementation is, in most cases, unnecessary and deadlock prone. 
> 
> ClusterImpl:
> I've gone through and removed the internal lock on the cluster. The cluster 
> is the highest level of logical organization and should only have 1 lock 
> (hence the reason for a cluster global lock). It made no sense to keep 
> another internal lock that either locked in parallel or contradicted the 
> global lock. Furthermore, methods on primary keys such as clusterId don't 
> even need to lock since their values won't change.
> 
> ServiceImpl:
> Locks around static data or primary key data such as service names, cluster 
> IDs, etc, should not have locks around them since they will never change. The 
> clusterGlobalLock was removed from many methods where the JPA entity of the 
> service was being written to; a cluster lock should not be used unless the 
> state of the cluster is changing or being retrieved.
> 
> ServiceComponentImpl:
> Locks around static data or primary key data such as service names, cluster 
> IDs, etc, should not have locks around them since they will never change. In 
> many cases, the clusterGlobalLock was removed for the same reasons as stated 
> in ServiceImpl.
> 
> ServiceComponentHostImpl
> Locks around static data or primary key data such as service names, cluster 
> IDs, etc, should not have locks around them since they will never change. In 
> many cases, the clusterGlobalLock was removed for the same reasons as stated 
> in ServiceImpl.
> 
> Overall, the global cluster lock was left on methods that alter the topology 
> of the cluster (add/delete of a service/component/host) or when refreshing 
> the internal entity. The cluster global lock, however, once the cluster 
> topology has changed. The internal locks were left in place where there was 
> work being done around a non-thread-safe member or when an internal entity 
> was having non-primary-key data set.
> 
> 
> Diffs
> -----
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/state/ServiceComponentImpl.java
>  a31e42c 
>   ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java 
> 85e8314 
>   
> ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java
>  0bfaa19 
>   
> ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java
>  dbdef2b 
>   
> ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterDeadlockTest.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30357/diff/
> 
> 
> Testing
> -------
> 
> Installed a small (5 node) and large (20 node) cluster and performed an 
> end-to-end install with all services. In both cases, I had two browsers 
> opened so there were interwoven requests during installations. 
> 
> In the small cluster, I forced a yum repository error on install so that I 
> would need to retry the failed install. The cluster deleted and the services 
> were repopulated on the retry.
> 
> After both clusters were installed, I execute an Add Service from the browser 
> at the same time as a DELETE service request from a 2nd browser. In both 
> cases, the requests were scheduled correctly and the services were propagated.
> 
> Testing also included adding new hosts, and adding new host components from 2 
> different browers simultaneously. 
> 
> New tests were added to cover threads performing read serialization on the 
> impl's concurrently with impl creation and writes. These tests deadlocked 
> before the changes.
> 
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] Total time: 29:21 min
> [INFO] Finished at: 2015-01-28T01:36:06-05:00
> [INFO] Final Memory: 36M/443M
> [INFO] 
> ------------------------------------------------------------------------
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>

Reply via email to