----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30357/#review70049 -----------------------------------------------------------
Ship it! Ship It! - Nate Cole On Jan. 28, 2015, 12:38 p.m., Jonathan Hurley wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/30357/ > ----------------------------------------------------------- > > (Updated Jan. 28, 2015, 12:38 p.m.) > > > Review request for Ambari, Alejandro Fernandez, Mahadev Konar, Nate Cole, and > Tom Beerbower. > > > Bugs: AMBARI-9368 > https://issues.apache.org/jira/browse/AMBARI-9368 > > > Repository: ambari > > > Description > ------- > > A deadlock scenario was identified between ServiceComponentImpl and > ServiceComponentHostImpl. The root cause was the pattern of locking that is > used in ClusterImpl, ServiceImpl, ServiceComponentImpl and > ServiceComponentHostImpl. Essentially, our pattern of using 2 locks per > implementation is, in most cases, unnecessary and deadlock prone. > > ClusterImpl: > I've gone through and removed the internal lock on the cluster. The cluster > is the highest level of logical organization and should only have 1 lock > (hence the reason for a cluster global lock). It made no sense to keep > another internal lock that either locked in parallel or contradicted the > global lock. Furthermore, methods on primary keys such as clusterId don't > even need to lock since their values won't change. > > ServiceImpl: > Locks around static data or primary key data such as service names, cluster > IDs, etc, should not have locks around them since they will never change. The > clusterGlobalLock was removed from many methods where the JPA entity of the > service was being written to; a cluster lock should not be used unless the > state of the cluster is changing or being retrieved. > > ServiceComponentImpl: > Locks around static data or primary key data such as service names, cluster > IDs, etc, should not have locks around them since they will never change. In > many cases, the clusterGlobalLock was removed for the same reasons as stated > in ServiceImpl. > > ServiceComponentHostImpl > Locks around static data or primary key data such as service names, cluster > IDs, etc, should not have locks around them since they will never change. In > many cases, the clusterGlobalLock was removed for the same reasons as stated > in ServiceImpl. > > Overall, the global cluster lock was left on methods that alter the topology > of the cluster (add/delete of a service/component/host) or when refreshing > the internal entity. The cluster global lock, however, once the cluster > topology has changed. The internal locks were left in place where there was > work being done around a non-thread-safe member or when an internal entity > was having non-primary-key data set. > > > Diffs > ----- > > > ambari-server/src/main/java/org/apache/ambari/server/state/ServiceComponentImpl.java > a31e42c > ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java > 85e8314 > > ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java > 0bfaa19 > > ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java > dbdef2b > > ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterDeadlockTest.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/30357/diff/ > > > Testing > ------- > > Installed a small (5 node) and large (20 node) cluster and performed an > end-to-end install with all services. In both cases, I had two browsers > opened so there were interwoven requests during installations. > > In the small cluster, I forced a yum repository error on install so that I > would need to retry the failed install. The cluster deleted and the services > were repopulated on the retry. > > After both clusters were installed, I execute an Add Service from the browser > at the same time as a DELETE service request from a 2nd browser. In both > cases, the requests were scheduled correctly and the services were propagated. > > Testing also included adding new hosts, and adding new host components from 2 > different browers simultaneously. > > New tests were added to cover threads performing read serialization on the > impl's concurrently with impl creation and writes. These tests deadlocked > before the changes. > > [INFO] > ------------------------------------------------------------------------ > [INFO] BUILD SUCCESS > [INFO] > ------------------------------------------------------------------------ > [INFO] Total time: 29:21 min > [INFO] Finished at: 2015-01-28T01:36:06-05:00 > [INFO] Final Memory: 36M/443M > [INFO] > ------------------------------------------------------------------------ > > > Thanks, > > Jonathan Hurley > >
