Dean, I just fixed it yesterday.
the commit is --- diff --git a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py index 5f63c06..5256d03 100755 --- a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py +++ b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py @@ -27,7 +27,7 @@ from CsRoute import CsRoute from CsRule import CsRule VRRP_TYPES = ['guest'] -PUBLIC_INTERFACE = ['eth1'] +VPC_PUBLIC_INTERFACE = ['eth1'] class CsAddress(CsDataBag): @@ -323,7 +323,7 @@ class CsIP: # If redundant only bring up public interfaces that are not eth1. # Reason: private gateways are public interfaces. # master.py and keepalived will deal with eth1 public interface. - if self.cl.is_redundant() and (not self.is_public() or self.getDevice() not in PUBLIC_INTERFACE): + if self.cl.is_redundant() and (not self.is_public() or (self.config.is_vpc() and self.getDevice() not in VPC_PUBLIC_INTERFACE)): CsHelper.execute(cmd2) # if not redundant bring everything up if not self.cl.is_redundant(): --- - Wei 2016-04-01 20:13 GMT+02:00 Dean Close <dean.cl...@icloudhosting.com>: > Hi guys, > > I had been investigating a possible bug with the way interfaces are > managed on virtual routers. The public interfaces are being brought up on > backup routers and (because they boot second) they arp the IPs away from > the master. I'd been examining an idea for a fix but whilst doing so I > found that the system appears to be designed to bring up these interfaces. > > I suspect that a few things need to be reworked - but the changes > necessary go so far against what has been implemented that I wanted to open > this up before doing the work. > > Hopefully if I go through my findings you guys can help me see what I > might be getting wrong. > > The following was correct for pre-4.6 redundant routers: > > 1. Both routers get configured with IP addresses, routes and iptables > rules. > 2. Public interfaces are initially set as DOWN. > 3. Keepalived runs a VRRP instance on the private interface (eth0) to > negotiate MASTER/BACKUP roles. > 4. Keepalived manages the virtual IP on eth0 used as the public gateway > for the guest VMs. > 5. Keepalived uses a master notify script to bring up the public > interfaces. > > The above was true for pre-4.6 routers. Now, however, things appear to > work differently: > > 1. Both routers get configured as before. > 2. All interfaces apart from eth1 (the Hypervisor-link interface) are > set as UP. > 3. Keepalived runs a VRRP instance on the first public interface (eth2) > to negotiate MASTER/BACKUP roles. > 4. Keepalived manages the virtual IP as before. > 5. Keepalived uses a master notify script to bring up the public > interfaces (unnecessary) > 6. Keepalived uses a backup notify script to bring down the public > interfaces (unused) > > This is unexpected for the following reasons: > > 1. The keepalived notify script brings the public interfaces down when > transitioning to BACKUP - so how can we expect to run a VRRP instance over > eth2? > 2. If interfaces are down when transitioning to BACKUP, why are they not > expected to be down to begin with? (Before the router has become MASTER) > 3. Why are we running a VRRP instance over an interface with an IP that > will clash with another host on the network? > > The following method from the CsIP class in /opt/cloud/bin/cs/CsAddress.py > confuses matters futher: > > def check_is_up(self): > """ Ensure device is up """ > cmd = "ip link show %s | grep 'state DOWN'" % self.getDevice() > for i in CsHelper.execute(cmd): > if " DOWN " in i: > cmd2 = "ip link set %s up" % self.getDevice() > # If redundant only bring up public interfaces that are > not eth1. > # Reason: private gateways are public interfaces. > # master.py and keepalived will deal with eth1 public > interface. > if self.cl.is_redundant() and (not self.is_public() or > self.getDevice() not in PUBLIC_INTERFACE): > CsHelper.execute(cmd2) > # if not redundant bring everything up > if not self.cl.is_redundant(): > CsHelper.execute(cmd2) > > The comments refer to eth1 as a public interface when this is the link to > the hypervisor. Indeed, PUBLIC_INTERFACE is defined on line 31 as ['eth1']. > But keepalived and master.py don't influence eth1 at all. This looks like a > mistake. > > Lastly, the logic of this line looks flawed: > > if self.cl.is_redundant() and (not self.is_public() or self.getDevice() > not in PUBLIC_INTERFACE) > > As PUBLIC_INTERFACE is limited to eth1, the `not self.is_public()` will be > ignored. Public IPs will never be assigned to eth1, so this line evaluates > as: > > > if self.cl.is_redundant() and (self.getDevice() not in PUBLIC_INTERFACE) > > which reduces even further to: > > if self.cs.is_redundant() and self.is_control() > > > What would need doing > --------------------- > > 1. The keepalived.conf template would need to be changed to run the VRRP > instance over eth0. > 2. The check_is_up method of the CsIP class should be renamed to > 'bring_up_interfaces'. For redundant routers it should ignore IPs that pass > is_public or needs_vrrp. > 3. The arpPing method should do nothing if the interface is down. > 4. The PUBLIC_INTERFACE constant should be either renamed or dropped > altogether. > 5. Other things that I haven't considered? > > > I'd really appreciate any feedback on this. It's possible that I've got it > all wrong but I'm suspecting not. I just don't want to tread on anyone's > toes by submitting a PR that goes against what appears to be an explicit > design decision. > > > Kind regards, > > Dean Close > iCloudHosting.com > http://www.icloudhosting.com > Tel: 01582 227927 > > Unit 2, Smallmead Road, Reading RG2 0QS > > ****************************************************************** > The names iCloudHosting and iCloudHosting.com are trading styles of BBS > Commerce Ltd which is registered in England and Wales, Company Number > 04837714. Please use our trading address above for mail. Our registered > office is 5 Theale Lakes Business Park, Moulden Way, Sulhamstead, Reading, > Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 94. > > This email and any files transmitted with it are confidential and intended > solely for the use of the individual or entity to whom they are addressed. > If you are not the intended recipient you are not authorised to and must > not disclose, copy, distribute, or retain this message or any part of it. > > iCloudHosting accepts no responsibility for information, errors or > omissions in this email. > ****************************************************************** > >