Hi David,

Thank you for your questions.


As per the FS, there is a HA framework implementation that is agnostic of the 
resource and is not tied to how the HA is performed separating policy from 
mechanism. The task of fencing is implemented by a HA provider which is 
implementation specific.


The first version will include a HA provider for KVM (with NFS backed primary 
storage) in which we've chosen to put the host into maintenance mode when it is 
fenced (by oobm/ipmi) and the admin is required to manually put them back to 
the pool (i.e. remove from maintenance mode) because doing this automatically 
may have side-effects. Also, by having the HA framework separated from the 
hypervisor/storage specific logic anyone is free to implement their own HA 
provider with custom logic, options and algorithms (as a plugin).


We can start by getting the HA framework and some initial HA provider (driver 
implementations) reviewed and accepted, and over time support for other 
hypervisor and storage options such as Ceph can be added.


Regards.


________________________________
From: David Mabry <dma...@ena.com>
Sent: 18 February 2017 03:40
To: dev@cloudstack.apache.org
Subject: Re: [DISCUSS][FS] Host HA for CloudStack



 Rohit,

First, thanks for all the work you have put into this.  This is something that 
CS has sorely needed for a long time.

A couple of items:

1.) You state the following:
“Before invoking the HA provider’s fence operation, the HA resource management 
will place the resource in maintenance mode. The intention is to require an 
administrator to manually verify that a resource is ready to return service by 
requiring an administrator to take it out of maintenance mode.”
I agree that putting a host in maintenance mode to require manual intervention 
in order to bring it back online is ideal and honestly how I would probably 
prefer to do it.  However, I also like to give the end user/operator choice.  
Perhaps we could add an option to bring the Host out of Maintenance mode 
automatically if it passes all checks and comes back into an ELIGIBLE state.  
This way, if the operator chooses, the host could come back into full operation 
and start recovering VMs if needed.  This could also be handy if your 
environment isn’t quite n+1 when it comes to host capacity and you need to have 
the host back up and running as soon as possible to minimize the outage 
duration.  Again, I know it isn’t ideal, but I don’t see the harm in giving the 
operator the choice.

2.) You state the following:
“For the initial release, only KVM with NFS storage will be supported. However, 
the storage check component will be implemented in a modular fashion allowing 
for checks using other storage platforms(e.g. Ceph) in the future. HA provider 
plugins can be implemented for other hypervisors.”
We are using KVM with a Ceph backend and would be very interested in helping 
make it a part of the initial push for this feature.  I have a Dev environment 
backed by Ceph that we could use for teseting and would be willing to help with 
the development of the Ceph activity checks.

I’m looking forward to getting this feature added to CS.  Again, great job 
putting this together and starting the conversation.

Thanks,
Mabry


rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

Reply via email to