[jira] [Commented] (CLOUDSTACK-3137) systemvms HA fails on latest master (4591f94)

Koushik Das (JIRA) Wed, 03 Jul 2013 16:53:57 -0700

    [ 
https://issues.apache.org/jira/browse/CLOUDSTACK-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699611#comment-13699611
 ]


Koushik Das commented on CLOUDSTACK-3137:
-----------------------------------------

System VM doesn't go through the regular HA process as they are not HA enabled 
(check vm_instance table in the db for system vm instances). The HA related 
work items for system VMs get scheduled as a part of vm sync that happens 
during host connect. This is the reason for seeing HA logs.

There is a different mechanism to ensure that system VMs are always available. 
Refer to SystemVmLoadScanner.initScan() method for that. This task scans if 
system VMs are available and if not tries to start them.

I tried the repro steps with the latest master and was not able to hit the 
issue. I rebooted the MS while the SSVM was in 'Starting' state in CS and after 
that the SSVM came up. Does the system VMs come up eventually in your setup? If 
they do then there is no issue.


                
> systemvms HA fails on latest master (4591f94)
> ---------------------------------------------
>
>                 Key: CLOUDSTACK-3137
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3137
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>            Reporter: Prasanna Santhanam
>            Priority: Blocker
>         Attachments: logs.tar.bz2
>
>
> SystemVMs fail to start during HA on latest master 
> (4591f94a0bb69875dca769c9738998d3b5b96106)
> It looks like the HA of system VMs is failing. On our automated test 
> environment we reboot the management server before the systemVMs spin up to 
> ensure that the global settings are affected and the configuration for the 
> secstorage.internal.sites is rightly set in the secondary storage VM. The 
> management server upon reboot discovers that SSVM was put to perform HA in 
> the op_ha_work table and starts to re-spin the SSVM. At this time the CPVM is 
> also booting up. No further activity is observed after two HA attempts.
> SystemVM start before Management server start: 
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,738 DEBUG 
> [cloud.vm.VirtualMachineManagerImpl] (secstorage-1:null) Allocating entries 
> for VM: VM[SecondaryStorageVm|s-1-VM]
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,771 DEBUG 
> [cloud.vm.VirtualMachineManagerImpl] (secstorage-1:null) Allocating nics for 
> VM[SecondaryStorageVm|s-1-VM]
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,774 DEBUG 
> [cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm 
> VM[SecondaryStorageVm|s-1-VM] in network Ntwk[200|Public|1] with requested 
> profile NicProfile[0-0-null-null-null
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,843 DEBUG 
> [cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm 
> VM[SecondaryStorageVm|s-1-VM] in network Ntwk[202|Control|3] with requested 
> profile null
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,870 DEBUG 
> [cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm 
> VM[SecondaryStorageVm|s-1-VM] in network Ntwk[201|Management|2] with 
> requested profile null
> Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,916 DEBUG 
> [cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm 
> VM[SecondaryStorageVm|s-1-VM] in network Ntwk[203|Storage|4] with requested 
> profile null
> ...
> ...
> Jun 24 22:30:52 cloudstack-centos63 local0: 2013-06-25 05:30:52,552 DEBUG 
> [agent.transport.Request] (consoleproxy-1:null) Seq 3-2067660811: Sending  { 
> Cmd , MgmtId: 200973787296321, via: 3, Ver: v1, Flags: 100111, 
> [{"org.apache.cloudstack.storage.command.CopyCommand":{"srcTO":{"org.apache.cloudstack.storage.to.TemplateObjectTO":{"path":"template/tmpl/1/1/","origUrl":"http://nfs/templates/acton/acton-systemvm-02062012.vhd.bz2","uuid":"1","id":1,"format":"VHD","accountId":1,"checksum":"f613f38c96bf039f2e5cbf92fa8ad4f8","hvm":false,"displayText":"SystemVM
>  Template 
> (XenServer)","imageDataStore":{"com.cloud.agent.api.to.NfsTO":{"_url":"nfs://nfs.fmt.vmops.com:/expo...
> Jun 24 22:31:01 cloudstack-centos63 local0: 2013-06-25 05:31:01,824 INFO  
> [context.support.XmlWebApplicationContext] (Thread-26:null) Closing Root 
> WebApplicationContext: startup date [Mon Jun 24 23:25:50 UTC 2013]; root of 
> context hierarchy
> ^^^^^^^^^^ Management Server Restarts ^^^^^^^^^^^^^^
> After restart SSVM scheduled for HA
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,394 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (Timer-1:null) Schedule vm for HA:  
> VM[SecondaryStorageVm|s-1-VM]
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,455 INFO  
> [cloud.vm.VirtualMachineManagerImpl] (Timer-1:null) Handling unfinished work 
> item: ItWork[1805a89b-d03a-4f1f-b717-ac9f5da6edea-Starting-2-Prepare]
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,504 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (Timer-1:null) Schedule vm for HA:  
> VM[ConsoleProxy|v-2-VM]
> ...
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,540 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:null) Starting work
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,547 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:null) Starting work
> ...
> Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,815 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-1) Processing 
> HAWork[1-HA-1-Starting-Investigating]
> ...
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,289 DEBUG 
> [cloud.ha.CheckOnAgentInvestigator] (HA-Worker-0:work-2) Unable to reach the 
> agent for VM[ConsoleProxy|v-2-VM]: Resource [Host:3] is unreachable: Host 3: 
> Host with specified id is not in the right state: Disconnected
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,290 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) 
> SimpleInvestigator found VM[ConsoleProxy|v-2-VM]to be alive? null
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,293 DEBUG 
> [cloud.ha.CheckOnAgentInvestigator] (HA-Worker-2:work-1) Unable to reach the 
> agent for VM[SecondaryStorageVm|s-1-VM]: Resource [Host:2] is unreachable: 
> Host 2: Host with specified id is not in the right state: Disconnected
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,298 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-1) 
> SimpleInvestigator found VM[SecondaryStorageVm|s-1-VM]to be alive? null
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,336 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) 
> XenServerInvestigator found VM[ConsoleProxy|v-2-VM]to be alive? null
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,338 DEBUG 
> [cloud.ha.UserVmDomRInvestigator] (HA-Worker-0:work-2) Not a User Vm, unable 
> to determine state of VM[ConsoleProxy|v-2-VM] returning null
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,338 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) null found 
> VM[ConsoleProxy|v-2-VM]to be alive? null
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,338 DEBUG 
> [cloud.ha.ManagementIPSystemVMInvestigator] (HA-Worker-0:work-2) Testing if 
> VM[ConsoleProxy|v-2-VM] is alive
> ....
> Rescheduled HA
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,623 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) Rescheduling 
> HAWork[2-HA-2-Starting-Investigating] to try again at Tue Jun 25 05:43:07 UTC 
> 2013
> Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,626 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-1) Rescheduling 
> HAWork[1-HA-1-Starting-Investigating] to try again at Tue Jun 25 05:43:07 UTC 
> 2013
> ...
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,586 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) Processing 
> HAWork[1-HA-1-Starting-Investigating]
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,592 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) HA on 
> VM[SecondaryStorageVm|s-1-VM]
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,593 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) VM 
> VM[SecondaryStorageVm|s-1-VM] has been changed.  Current State = Starting 
> Previous State = Starting last updated = 6 previous updated = 2
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,593 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) Completed 
> HAWork[1-HA-1-Starting-Investigating]
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,611 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) Processing 
> HAWork[2-HA-2-Starting-Investigating]
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,618 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) HA on 
> VM[ConsoleProxy|v-2-VM]
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,618 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) VM 
> VM[ConsoleProxy|v-2-VM] has been changed.  Current State = Starting Previous 
> State = Starting last updated = 6 previous updated = 2
> Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,618 INFO  
> [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) Completed 
> HAWork[2-HA-2-Starting-Investigating]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CLOUDSTACK-3137) systemvms HA fails on latest master (4591f94)

Reply via email to