koushik-das commented on issue #1960: [4.11/Future] CLOUDSTACK-9782: Host HA 
and KVM HA provider
URL: https://github.com/apache/cloudstack/pull/1960#issuecomment-298605864
 
 
   @rhtyd Sorry for the late reply, got busy with other stuff.
   
   So as I understand 'Host-HA' is more to do with the lifecycle of a host in 
the event of failure (detecting host failure, investigation, fencing etc.). The 
name is confusing and something more appropriate should have been used to 
indicate the intent. 'Host-HA' to me is something like bringing up an identical 
host in the event of host failure.
   
   I would suggest you read 
https://cwiki.apache.org/confluence/display/CLOUDSTACK/High+Availability+Developer%27s+Guide
 to get a clear idea of the existing VM HA framework.
   
   Let me quote some specific lines from your last comment and provide my 
comments.
   
   >>>> With this PR, we've presented two Host-HA plugins (a) a HA provider 
simulator to verify the framework in a deterministic way that's not possible 
with the status quo, and (b) a HA provider for KVM that uses out-of-band 
management.
   
   About point (a) that you have mentioned is not correct. In a previous reply 
on dev@... I had mentioned that simulator based test cases are there for 
testing the HA framework. I would suggest you check them out.
   
   Regarding (b), why can't OOBM be used in the existing KVMFencer, 
KVMInvestigator? I would like to understand what is preventing us from doing so.
   
   >>>> For current VM-HA VMs are first class resource/objects, while for 
Host-HA at the framework level the first class resource/objects are agnostic 
but the implementation is not and is as per host/hypervisor specific plugins...
   
   The CS user (not admin or operator) deals with VMs and so it make sense to 
provide HA at VM level. The user is not aware about HV hosts on which VMs are 
running. So as long as the VM is properly handled in the event of failure 
things are good.
   
   >>>> For Host-HA to work and execute the recovery and fence operations -- VM 
HA should not kick-in before host is successfully investigated, recovered, 
fenced etc.
   
   In the existing framework, host (or agent) is what is first investigated and 
based on that actions are taken on the VMs running on them. Refer to the link I 
have shared for a better understanding.
   
   
   >>>> The current VM HA framework does a poor job at reliably fencing a host, 
at least on KVM based large prod. environments this has shown to cause disk 
corruptions.
   
   You are mixing up the VM HA framework with the KVM HA plugin (KVM Fencer, 
KVM Investigator). If the framework was not upto the mark there would have been 
issues in all HVs and not only KVM. Why can't KVM plugin be reimplemented to 
use OOBM or any other mechanism to detect host status and do fencing on the VMs?
   
   
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to