I hope you have read this: https://cwiki.apache.org/confluence/x/twDFAQ A good log has the 5W's: Who, What, Where, Why, When: your log does not indicate What, Why and Where. A better log would be: "Caught an exception while trying to schedule a host scan task on <>: ignoring because foo"
On 1/31/13 11:01 AM, "Koushik Das" <koushik....@citrix.com> wrote: > > >> On Jan. 31, 2013, 6:32 p.m., Chiradeep Vittal wrote: >> > server/src/com/cloud/cluster/ClusterManagerImpl.java, line 371 >> > >><https://reviews.apache.org/r/9133/diff/3/?file=253825#file253825line371> >> > >> > If the cloud operator sees this WARNING, what is he supposed to >>do? Should it be INFO? Should you tell him that it is safe to ignore? > >What is the logging guideline in the case of suppressing an exception? I >see in other places in the code that a warning is logged in a similar >situation. As long as there is consistency I feel that warning is fine. I >would interpret the warning as some operation failed but the system can >recover from that. > > >- Koushik > > >----------------------------------------------------------- >This is an automatically generated e-mail. To reply, visit: >https://reviews.apache.org/r/9133/#review15951 >----------------------------------------------------------- > > >On Jan. 31, 2013, 9:10 a.m., Koushik Das wrote: >> >> ----------------------------------------------------------- >> This is an automatically generated e-mail. To reply, visit: >> https://reviews.apache.org/r/9133/ >> ----------------------------------------------------------- >> >> (Updated Jan. 31, 2013, 9:10 a.m.) >> >> >> Review request for cloudstack, Abhinandan Prateek and Alex Huang. >> >> >> Description >> ------- >> >> The issue happens randomly when hosts in a cluster gets distributed >>across multiple MS. Host can get split in following scenarios: >> a. Add host MS on which add host is executed takes ownership of >>the host. So if 2 hosts belonging to same cluster are added from 2 >>different MS then cluster gets split >> b. scanDirectAgentToLoad This runs every 90 secs. and check if >>there are any hosts that needs to be reconnected. The current logic of >>host scan can also lead to a split >> >> The idea is to fix (b) to ensure that hosts in a cluster are >>managed by same MS. For (a) only the entry in the database is going to >>be created except in case if the host getting added is first in the >>cluster (in this case agent creation happens at the same time) and then >>(b) will take care of connection and agent creation part. Since >>currently addHost only creates an entry in the db there is a small >>window where the host state will be shown as 'Alert' till the time (b) >>is scheduled and picks up the host to make a connection. The MS doing >>add host will immediately schedule a scan task and also send >>notification to peers to start the scan task. >> >> >> This addresses bug CLOUDSTACK-606. >> >> >> Diffs >> ----- >> >> api/src/com/cloud/agent/api/ScheduleHostScanTaskCommand.java >>PRE-CREATION >> server/src/com/cloud/agent/manager/ClusteredAgentManagerImpl.java >>ca0bf5c >> server/src/com/cloud/cluster/ClusterManagerImpl.java e341b88 >> server/src/com/cloud/host/dao/HostDaoImpl.java 0881675 >> server/src/com/cloud/resource/ResourceManagerImpl.java f82424a >> >> Diff: https://reviews.apache.org/r/9133/diff/ >> >> >> Testing >> ------- >> >> Manually tested the following scenarios: >> >> - Added hostA in cluster1 from MS1, gets owned by MS1 as first host in >>cluster. Added hostB in same cluster1 from MS2. Once both hosts are in >>'Up' state ensure that they are owned by the same MS (i.e. MS1). >> - Error scenarios when host goes to disconnected, alert or down state >>(disconnected host from network) and is reconnected back (connected to >>network). Ensure that once connected back, host should be owned by same >>MS as other hosts in the cluster. >> - Have a scenario where hosts are already in a distributed state >>(before the fix added hosts to the same cluster from different MSs) and >>ensure that after applying the patch and retarting the MSs distribution >>happens properly. >> - Did basic validation in a single MS setup, added multiple hosts in a >>cluster and created VMs on them. >> >> >> Thanks, >> >> Koushik Das >> >> >