actually not only this issue, but more (1) restart vpc router will break the network if there are multiple tiers in the vpc. (2) check_heartbeat.sh does not work.
I will create tickets for them next Monday, and create PR on github. -Wei 2016-04-01 21:14 GMT+02:00 Wei ZHOU <ustcweiz...@gmail.com>: > Dean, > > I just fixed it yesterday. > > the commit is > > --- > > diff --git a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py > b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py > index 5f63c06..5256d03 100755 > --- a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py > +++ b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py > @@ -27,7 +27,7 @@ from CsRoute import CsRoute > from CsRule import CsRule > > VRRP_TYPES = ['guest'] > -PUBLIC_INTERFACE = ['eth1'] > +VPC_PUBLIC_INTERFACE = ['eth1'] > > class CsAddress(CsDataBag): > > @@ -323,7 +323,7 @@ class CsIP: > # If redundant only bring up public interfaces that are > not eth1. > # Reason: private gateways are public interfaces. > # master.py and keepalived will deal with eth1 public > interface. > - if self.cl.is_redundant() and (not self.is_public() or > self.getDevice() not in PUBLIC_INTERFACE): > + if self.cl.is_redundant() and (not self.is_public() or > (self.config.is_vpc() and self.getDevice() not in VPC_PUBLIC_INTERFACE)): > CsHelper.execute(cmd2) > # if not redundant bring everything up > if not self.cl.is_redundant(): > > --- > > - Wei > > 2016-04-01 20:13 GMT+02:00 Dean Close <dean.cl...@icloudhosting.com>: > >> Hi guys, >> >> I had been investigating a possible bug with the way interfaces are >> managed on virtual routers. The public interfaces are being brought up on >> backup routers and (because they boot second) they arp the IPs away from >> the master. I'd been examining an idea for a fix but whilst doing so I >> found that the system appears to be designed to bring up these interfaces. >> >> I suspect that a few things need to be reworked - but the changes >> necessary go so far against what has been implemented that I wanted to open >> this up before doing the work. >> >> Hopefully if I go through my findings you guys can help me see what I >> might be getting wrong. >> >> The following was correct for pre-4.6 redundant routers: >> >> 1. Both routers get configured with IP addresses, routes and iptables >> rules. >> 2. Public interfaces are initially set as DOWN. >> 3. Keepalived runs a VRRP instance on the private interface (eth0) to >> negotiate MASTER/BACKUP roles. >> 4. Keepalived manages the virtual IP on eth0 used as the public gateway >> for the guest VMs. >> 5. Keepalived uses a master notify script to bring up the public >> interfaces. >> >> The above was true for pre-4.6 routers. Now, however, things appear to >> work differently: >> >> 1. Both routers get configured as before. >> 2. All interfaces apart from eth1 (the Hypervisor-link interface) are >> set as UP. >> 3. Keepalived runs a VRRP instance on the first public interface (eth2) >> to negotiate MASTER/BACKUP roles. >> 4. Keepalived manages the virtual IP as before. >> 5. Keepalived uses a master notify script to bring up the public >> interfaces (unnecessary) >> 6. Keepalived uses a backup notify script to bring down the public >> interfaces (unused) >> >> This is unexpected for the following reasons: >> >> 1. The keepalived notify script brings the public interfaces down when >> transitioning to BACKUP - so how can we expect to run a VRRP instance over >> eth2? >> 2. If interfaces are down when transitioning to BACKUP, why are they >> not expected to be down to begin with? (Before the router has become MASTER) >> 3. Why are we running a VRRP instance over an interface with an IP that >> will clash with another host on the network? >> >> The following method from the CsIP class in >> /opt/cloud/bin/cs/CsAddress.py confuses matters futher: >> >> def check_is_up(self): >> """ Ensure device is up """ >> cmd = "ip link show %s | grep 'state DOWN'" % self.getDevice() >> for i in CsHelper.execute(cmd): >> if " DOWN " in i: >> cmd2 = "ip link set %s up" % self.getDevice() >> # If redundant only bring up public interfaces that are >> not eth1. >> # Reason: private gateways are public interfaces. >> # master.py and keepalived will deal with eth1 public >> interface. >> if self.cl.is_redundant() and (not self.is_public() or >> self.getDevice() not in PUBLIC_INTERFACE): >> CsHelper.execute(cmd2) >> # if not redundant bring everything up >> if not self.cl.is_redundant(): >> CsHelper.execute(cmd2) >> >> The comments refer to eth1 as a public interface when this is the link to >> the hypervisor. Indeed, PUBLIC_INTERFACE is defined on line 31 as ['eth1']. >> But keepalived and master.py don't influence eth1 at all. This looks like a >> mistake. >> >> Lastly, the logic of this line looks flawed: >> >> if self.cl.is_redundant() and (not self.is_public() or self.getDevice() >> not in PUBLIC_INTERFACE) >> >> As PUBLIC_INTERFACE is limited to eth1, the `not self.is_public()` will >> be ignored. Public IPs will never be assigned to eth1, so this line >> evaluates as: >> >> >> if self.cl.is_redundant() and (self.getDevice() not in PUBLIC_INTERFACE) >> >> which reduces even further to: >> >> if self.cs.is_redundant() and self.is_control() >> >> >> What would need doing >> --------------------- >> >> 1. The keepalived.conf template would need to be changed to run the >> VRRP instance over eth0. >> 2. The check_is_up method of the CsIP class should be renamed to >> 'bring_up_interfaces'. For redundant routers it should ignore IPs that pass >> is_public or needs_vrrp. >> 3. The arpPing method should do nothing if the interface is down. >> 4. The PUBLIC_INTERFACE constant should be either renamed or dropped >> altogether. >> 5. Other things that I haven't considered? >> >> >> I'd really appreciate any feedback on this. It's possible that I've got >> it all wrong but I'm suspecting not. I just don't want to tread on anyone's >> toes by submitting a PR that goes against what appears to be an explicit >> design decision. >> >> >> Kind regards, >> >> Dean Close >> iCloudHosting.com >> http://www.icloudhosting.com >> Tel: 01582 227927 >> >> Unit 2, Smallmead Road, Reading RG2 0QS >> >> ****************************************************************** >> The names iCloudHosting and iCloudHosting.com are trading styles of BBS >> Commerce Ltd which is registered in England and Wales, Company Number >> 04837714. Please use our trading address above for mail. Our registered >> office is 5 Theale Lakes Business Park, Moulden Way, Sulhamstead, Reading, >> Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 94. >> >> This email and any files transmitted with it are confidential and >> intended solely for the use of the individual or entity to whom they are >> addressed. If you are not the intended recipient you are not authorised to >> and must not disclose, copy, distribute, or retain this message or any part >> of it. >> >> iCloudHosting accepts no responsibility for information, errors or >> omissions in this email. >> ****************************************************************** >> >> >