RE: Issues when vCenter becomes unavailable

Musayev, Ilya Fri, 22 Feb 2013 15:49:08 -0800

Summary:

I have 3 hypervisors 
Hypervisor 1 and 2 are down, hypervisor 3 is up. All VMs live on hypervisor 3, 
however, the host_id in instance table for the VMs are not being updated to 
reflect the only hypervisor alive.


Details:

I physically powered 2 hypervisors that had most of my VMs and left 1 online.

The VMs were brought back online by vcenter, however from then on, I experience 
what Dave and Andreas mentioned.

That is, VMWare VMs instances are bound to host id (hypervisor) and not vcenter 
and operations that would be executed on the VMs require for the hypervisor to 
stay up. If the hypervisor goes off line, while VMs still come up in VC, CS 
cannot comprehend that these VMs now live on another hypervisor. 

This is bad for production roll outs - because VMs are bound to a hypervisor ID 
and not virtual center and it appears its not getting updated - though I do see 
in the log that CS is 

Did a little more digging, it looks like the host_ids don't get updated in 
mysql for vm in instances table. I need to double check on this because I 
totally messed 2 of test cloudstack clusters.

Can someone do the following test - if time allows - if not - I can try on 
monday:

1) Pick a hypervisor for a test crash and note 1 vm (I.e. i-2-89)
2) Navigate to "host" table in mysql and note the host_id for hypervisor that 
is about to be powered off.
3) In mysql goto instances table and note the last_host_id and host_id for a VM 
on test crash hypervisor.
4) Power off the hypervisor and let VCenter bring it back online
5) Attempt to launch a console on the VM was on crashed hypervisors and was 
powered back on by VC
6) If it fails - as it did in my case, alter the value of host_id to a next 
hypervisor its living on (my test is not clean because I've ruined the cluster 
that hosts my console vm and don't have time now to work on it ATM)
7) Launch console again to see if the issue resolved

I'm under suspicion the host_id does not get updated as I witnessed by 
examining mysql instance table, but I need to fix my env issues to confirm.

Regards
ilya


-----Original Message-----
From: Chiradeep Vittal [mailto:[email protected]] 
Sent: Friday, February 22, 2013 3:41 PM
To: [email protected]
Cc: Kelven Yang; CloudStack DeveloperList
Subject: Re: Issues when vCenter becomes unavailable

CC'ing Kelven to see if he has any ideas.

On 2/22/13 12:22 PM, "Dave Dunaway" <[email protected]> wrote:

>If I may suggest also testing a disconnect of a host (hypervisor) from 
>vcenter, so that vcenter and CS can still talk, but vcenter cannot talk 
>to the hosts (hypervisors). CS marks the host as down or failed or whatever.
>
>When the host comes back up vcenter can it just fine and all seems good.
>That however is not the case (I had this with CS 3.0.5 and vmware esxi
>5.0)
>when CS tries to talk to vcenter and the previously disconnected host 
>(that is now recovered).
>
>What we experienced was that we had to migrate all guests off the 
>recovered host, and then destroy that host in CS, and re-create it. 
>Then we could migrate back onto it the guests which had been previously 
>migrated.
>
>The curious thing is that while CS did not want to send commands to the 
>host (it kept on saying host id=X has timedout when whatever command 
>was sent to it), CS WAS polling the host for resources and getting the 
>correct numbers.... so CS could in some ways talk to the host (ie: it 
>knew the capabilities, number of VMs on it, etc).
>
>Luckily for me this all happened in a test environment. In production, 
>this would have been a real nightmare!
>
>
>dave
>
>
>On Fri, Feb 22, 2013 at 2:48 PM, Musayev, Ilya <[email protected]> wrote:
>
>> Andi
>>
>> I'm on CS4.0. I simulated the VMWare VCenter 5 failure by adding a 
>>bogus  IP entry in /etc/hosts for 10 minutes for virtual center host. 
>>That in turn  made VC unreachable by CS.
>>
>> I then began executing commands and sure enough commands failed or 
>> backlogged. Once I restored VC connectivity, the backlogged commands 
>> executed and I did not experience any abnormalities.
>>
>> I will redo this test and leave VC off for an hour - maybe a need a 
>>longer  outage.
>>
>> Regards
>> ilya
>>
>>
>>
>> -----Original Message-----
>> From: Musayev, Ilya
>> Sent: Thursday, February 21, 2013 2:43 PM
>> To: [email protected]
>> Subject: RE: Issues when vCenter becomes unavailable
>>
>> This is definitely not the behavior we want with vcenter.
>>
>> I will test this out on my lab setup shortly.
>>
>> Thanks
>> ilya
>>
>> -----Original Message-----
>> From: Chip Childers [mailto:[email protected]]
>> Sent: Thursday, February 21, 2013 9:40 AM
>> To: [email protected]
>> Subject: Re: Issues when vCenter becomes unavailable
>>
>> On Thu, Feb 21, 2013 at 08:59:14AM -0500, Mathias Mullins wrote:
>> > Andreas,
>> >
>> > The open source community doesn't support the Citrix version 3.0.6.
>> > You need to report this via your Citrix Support contract. Sounds 
>> > like this could be a bug.
>> >
>> > Community - this could be a possible issue in 4.0.0 / 4.0.1. I 
>> > don't know if this test case has been explored.
>>
>> Thx - I forwarded to [email protected] to get the test engineers in the 
>> community to take a look.
>>
>> >
>> > Thanks,
>> > Matt Mullins
>> > CloudPlatform Implementation Engineer Worldwide Cloud Services  
>> > Citrix System, Inc.
>> > +1 (407) 920-1107  Office/Cell Phone
>> > [email protected]
>> >
>> >
>> >
>> > On 2/21/13 5:35 AM, "Fuchs, Andreas (SwissTXT)"
>> > <[email protected]> wrote:
>> >
>> > >Hi CS Users
>> > >
>> > >We are running CS 3.0.6 on a vSphere platform and found a strange 
>> > >behavior.
>> > >
>> > >When the vCenter becomes unavailable due to a reboot or some other 
>> > >issue, it seems that CS is shutting down instances when vCenter 
>> > >becomes available again.
>> > >
>> > >What we think what happens.
>> > >1. vCenter becomes unrechabale
>> > >2. CS marks the ESX servers as "down"
>> > >3. We think this leads to: CS marks the instances as down as well 4.
>> > >When vCenter becomes available again, CS stops the "marked as down"
>> > >instances
>> > >
>> > >This is very bad as the Instances where running all the time and 
>> > >the the shutdown issued by CS is forcing a service interruption.
>> > >
>> > >My problem is that I cannot realy reporoduce as allot of testing 
>> > >is ongoing on the platform at the moment, so my question:
>> > >
>> > >Does someone else see this issue as well and can maybe reproduce?
>> > >Is there a workaround to it, can I change some flag or something 
>> > >which tells CS to never shut down an instance by himself?
>> > >Why are the ESX hosts getting marked as down and not unreachable 
>> > >or something?
>> > >
>> > >Best regards
>> > >Andi
>> >
>> >
>>
>>
>>

RE: Issues when vCenter becomes unavailable

Reply via email to