I thought you would add some of the answers as comments in the code (for example why the exception is being swallowed)
On 1/30/13 2:37 AM, "Koushik Das" <koushik....@citrix.com> wrote: > >----------------------------------------------------------- >This is an automatically generated e-mail. To reply, visit: >https://reviews.apache.org/r/9133/ >----------------------------------------------------------- > >(Updated Jan. 30, 2013, 10:37 a.m.) > > >Review request for cloudstack, Abhinandan Prateek and Alex Huang. > > >Description >------- > >The issue happens randomly when hosts in a cluster gets distributed >across multiple MS. Host can get split in following scenarios: > a. Add host MS on which add host is executed takes ownership of the >host. So if 2 hosts belonging to same cluster are added from 2 different >MS then cluster gets split > b. scanDirectAgentToLoad This runs every 90 secs. and check if >there are any hosts that needs to be reconnected. The current logic of >host scan can also lead to a split > > The idea is to fix (b) to ensure that hosts in a cluster are managed >by same MS. For (a) only the entry in the database is going to be created >except in case if the host getting added is first in the cluster (in this >case agent creation happens at the same time) and then (b) will take care >of connection and agent creation part. Since currently addHost only >creates an entry in the db there is a small window where the host state >will be shown as 'Alert' till the time (b) is scheduled and picks up the >host to make a connection. The MS doing add host will immediately >schedule a scan task and also send notification to peers to start the >scan task. > > >This addresses bug CLOUDSTACK-606. > > >Diffs (updated) >----- > > api/src/com/cloud/agent/api/ScheduleHostScanTaskCommand.java >PRE-CREATION > server/src/com/cloud/agent/manager/ClusteredAgentManagerImpl.java >ca0bf5c > server/src/com/cloud/cluster/ClusterManagerImpl.java e341b88 > server/src/com/cloud/host/dao/HostDaoImpl.java 0881675 > server/src/com/cloud/resource/ResourceManagerImpl.java f82424a > >Diff: https://reviews.apache.org/r/9133/diff/ > > >Testing >------- > >Manually tested the following scenarios: > >- Added hostA in cluster1 from MS1, gets owned by MS1 as first host in >cluster. Added hostB in same cluster1 from MS2. Once both hosts are in >'Up' state ensure that they are owned by the same MS (i.e. MS1). >- Error scenarios when host goes to disconnected, alert or down state >(disconnected host from network) and is reconnected back (connected to >network). Ensure that once connected back, host should be owned by same >MS as other hosts in the cluster. >- Have a scenario where hosts are already in a distributed state (before >the fix added hosts to the same cluster from different MSs) and ensure >that after applying the patch and retarting the MSs distribution happens >properly. >- Did basic validation in a single MS setup, added multiple hosts in a >cluster and created VMs on them. > > >Thanks, > >Koushik Das >