Slair1 edited a comment on pull request #3915:
URL: https://github.com/apache/cloudstack/pull/3915#issuecomment-879948903


   > @ggoodrich-ipp @nvazquez
   > I do not think we can merge this pr for now.
   > 
   > 1. VR has HA, cloudstack will start it on other hosts if host is 
determined to be DOWN. hence there are two duplicated VRs running (on old and 
new host). this pr cannot solve the issue.
   > 2. if cloudstack does not start VR on other host, because the host is Up 
again, the control IP of VR is not changed. this pr is not needed.
   > 3. if VR is started out-of-band (eg virsh start), CheckRouter checks if 
control IP is reachable. we do not know if iptables rules or services are 
configured correctly.
   > 
   > @ggoodrich-ipp did you face this issue in a real environment ? or 
reproduce the issue (not hack the db) in a test environment ?
   @nvazquez 
   
   We did face this issue in a real environment.  The scenario is when the KVM 
agent is stopped and CloudStack thinks the host is down, but the host is in 
fact up and all VMs are still up - it is just the agent that is down.  In this 
scenario (i think the original PR description is accurate):
   
   VM HA runs for the router and as part of that, its 169.x.x.x control IP is 
unallocated. Then, it tries to power on the router on another host, and as part 
of that process it allocates a NEW 169.x.x.x control IP and writes that to the 
DB. However, since the router isn't actually down (host is up, just agent is 
down) the VM HA then fails (as the vRouter is currently still running on the 
problem host).  At this point, the DB is already changed - the control IP is 
changed.
   
   Next, in this scenario, when the host agent is back online again, it sends a 
power report to the mgmt servers, and the management servers see the router as 
ON. However, the GUI will not show a control IP for the vRouter, and the DB 
will have the NEW control IP it tried to allocated during the failed VM HA 
event. Thus, leaving us unable to communicate with the vRouter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to