Wido den Hollander created CLOUDSTACK-8643:
----------------------------------------------

             Summary: Helper for KVM High Availability
                 Key: CLOUDSTACK-8643
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8643
             Project: CloudStack
          Issue Type: Improvement
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: KVM, Management Server
         Environment: KVM hypervisors
            Reporter: Wido den Hollander
             Fix For: Future


When running KVM with NFS storage all Agents will write a heartbeat to the NFS.

Should a Agent go down, it will still be writing heartbeats even if libvirt has 
died.

Using these heartbeats the Management Server can ask other KVM Agents if the 
other server is still beating. If not, it can fence it.

While this works I've also encountered scenarios where you run without NFS and 
still want investigators.

My proposal would be a Agent Helper running NEXT to the Agent it self.

A simple Python daemon running a Basic HTTP server which queries libvirt every 
X seconds about:
* Running Instances
* Storage pools

If keeps this in memory, so that even when libvirt goes down it knows what the 
last state was.

Using the Qemu Monitor sockets we can actually see if the guests we have in 
memory are still online.

If they are we simply keep the list.

Now, if a investigator comes by and wants to know if the host is still up it 
can ALSO ask the helper.

The management server can ask the helper, but the other agents could as well.

This doesn't work in all cases, eg where storage is lost. But a additional 
helper would be useful to catch scenarios where the Agent itself became 
unresponsive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to