Hi Wei, I implemented your modifications on a test cloud we have here and it worked well. Redundant routers in the backup state booted with public interfaces down.
As you said, they also negotiated the VRRP instance over eth0 which is correct. As an aside, regarding the following code: if self.cl.is_redundant() and (not self.is_public() or (self.config.is_vpc() and self.getDevice() not in VPC_PUBLIC_INTERFACE)): I expanded this out to help me make sense of the logic. I also disposed of the preceding comment block as it was misleading in non-VPC contexts. The full block I'm running here is: # if not redundant bring everything up if not self.cl.is_redundant(): CsHelper.execute(cmd2) # Otherwise only bring up non-public interfaces elif self.cl.is_redundant() and self.config.is_vpc() and \ (self.getDevice() not in VPC_PUBLIC_INTERFACE): CsHelper.execute(cmd2) elif self.cl.is_redundant() and not self.is_public(): CsHelper.execute(cmd2) Dean -----Original Message----- From: Wei ZHOU [mailto:ustcweiz...@gmail.com] Sent: 02 April 2016 07:02 To: dev@cloudstack.apache.org Subject: Re: Redundant Router Interfaces Hi Dean, the VRRP instance will be set to eth0 (guest device) in router, and eth2 (the first guest device ) in vpc router. I think it is good. -Wei 2016-04-02 0:22 GMT+02:00 Dean Close <dean.cl...@icloudhosting.com>: > Hi Wei, > > It looks like your patch will prevent redundant routers from bringing > up public interfaces. We also need to: > > 1. shift the VRRP instance over to eth0 as eth2 will be down on the backup. > 2. Ensure that eth0 and eth1 are up on both redundant virtual routers. > > Your patch might already solve point 2. What do you think? > > Point 1 will require modifying the keepalived.conf template. Do you > want to handle this in your PR or shall I arrange that? > > Kind Regards, > > Dean Close > iCloudHosting.com > http://www.icloudhosting.com > Tel: 01582 227927 > > Spectrum House, Dunstable Road, Redbourn, AL3 7PR > > > ****************************************************************** > > The names iCloudHosting and iCloudHosting.com are trading styles of > BBS Commerce Ltd which is registered in England and Wales, Company > Number 04837714. Please use our trading address above for mail. Our > registered office is 5 Theale Lakes Business Park, Moulden Way, > Sulhamstead, Reading, Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 > 94. > > This email and any files transmitted with it are confidential and > intended solely for the use of the individual or entity to whom they are > addressed. > If you are not the intended recipient you are not authorised to and > must not disclose, copy, distribute, or retain this message or any part of it. > > iCloudHosting accepts no responsibility for information, errors or > omissions in this email. > > ****************************************************************** > > > > > On Fri, Apr 1, 2016 at 8:34 pm, Wei ZHOU <ustcweiz...@gmail.com> wrote: > actually not only this issue, but more > (1) restart vpc router will break the network if there are multiple > tiers in the vpc. > (2) check_heartbeat.sh does not work. > > I will create tickets for them next Monday, and create PR on github. > > -Wei > > 2016-04-01 21:14 GMT+02:00 Wei ZHOU <ustcweiz...@gmail.com>: > > > Dean, > > > > I just fixed it yesterday. > > > > the commit is > > > > --- > > > > diff --git > > a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py > > b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py > > index 5f63c06..5256d03 100755 > > --- a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py > > +++ b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py > > @@ -27,7 +27,7 @@ from CsRoute import CsRoute from CsRule import > > CsRule > > > > VRRP_TYPES = ['guest'] > > -PUBLIC_INTERFACE = ['eth1'] > > +VPC_PUBLIC_INTERFACE = ['eth1'] > > > > class CsAddress(CsDataBag): > > > > @@ -323,7 +323,7 @@ class CsIP: > > # If redundant only bring up public interfaces that are not eth1. > > # Reason: private gateways are public interfaces. > > # master.py and keepalived will deal with eth1 public interface. > > - if self.cl.is_redundant() and (not self.is_public() or > > self.getDevice() not in PUBLIC_INTERFACE): > > + if self.cl.is_redundant() and (not self.is_public() or > > (self.config.is_vpc() and self.getDevice() not in VPC_PUBLIC_INTERFACE)): > > CsHelper.execute(cmd2) > > # if not redundant bring everything up if not > > self.cl.is_redundant(): > > > > --- > > > > - Wei > > > > 2016-04-01 20:13 GMT+02:00 Dean Close <dean.cl...@icloudhosting.com>: > > > >> Hi guys, > >> > >> I had been investigating a possible bug with the way interfaces are > >> managed on virtual routers. The public interfaces are being brought > >> up > on > >> backup routers and (because they boot second) they arp the IPs away > >> from the master. I'd been examining an idea for a fix but whilst > >> doing so I found that the system appears to be designed to bring up > >> these > interfaces. > >> > >> I suspect that a few things need to be reworked - but the changes > >> necessary go so far against what has been implemented that I wanted > >> to > open > >> this up before doing the work. > >> > >> Hopefully if I go through my findings you guys can help me see what > >> I might be getting wrong. > >> > >> The following was correct for pre-4.6 redundant routers: > >> > >> 1. Both routers get configured with IP addresses, routes and > >> iptables rules. > >> 2. Public interfaces are initially set as DOWN. > >> 3. Keepalived runs a VRRP instance on the private interface (eth0) > >> to negotiate MASTER/BACKUP roles. > >> 4. Keepalived manages the virtual IP on eth0 used as the public > >> gateway for the guest VMs. > >> 5. Keepalived uses a master notify script to bring up the public > >> interfaces. > >> > >> The above was true for pre-4.6 routers. Now, however, things appear > >> to work differently: > >> > >> 1. Both routers get configured as before. > >> 2. All interfaces apart from eth1 (the Hypervisor-link interface) > >> are set as UP. > >> 3. Keepalived runs a VRRP instance on the first public interface > >> (eth2) to negotiate MASTER/BACKUP roles. > >> 4. Keepalived manages the virtual IP as before. > >> 5. Keepalived uses a master notify script to bring up the public > >> interfaces (unnecessary) 6. Keepalived uses a backup notify script > >> to bring down the public interfaces (unused) > >> > >> This is unexpected for the following reasons: > >> > >> 1. The keepalived notify script brings the public interfaces down > >> when transitioning to BACKUP - so how can we expect to run a VRRP > >> instance > over > >> eth2? > >> 2. If interfaces are down when transitioning to BACKUP, why are > >> they not expected to be down to begin with? (Before the router has > >> become > MASTER) > >> 3. Why are we running a VRRP instance over an interface with an IP > >> that will clash with another host on the network? > >> > >> The following method from the CsIP class in > >> /opt/cloud/bin/cs/CsAddress.py confuses matters futher: > >> > >> def check_is_up(self): > >> """ Ensure device is up """ > >> cmd = "ip link show %s | grep 'state DOWN'" % self.getDevice() for > >> i in CsHelper.execute(cmd): > >> if " DOWN " in i: > >> cmd2 = "ip link set %s up" % self.getDevice() # If redundant only > >> bring up public interfaces that are not eth1. > >> # Reason: private gateways are public interfaces. > >> # master.py and keepalived will deal with eth1 public interface. > >> if self.cl.is_redundant() and (not self.is_public() or > >> self.getDevice() not in PUBLIC_INTERFACE): > >> CsHelper.execute(cmd2) > >> # if not redundant bring everything up if not > >> self.cl.is_redundant(): > >> CsHelper.execute(cmd2) > >> > >> The comments refer to eth1 as a public interface when this is the > >> link > to > >> the hypervisor. Indeed, PUBLIC_INTERFACE is defined on line 31 as > ['eth1']. > >> But keepalived and master.py don't influence eth1 at all. This > >> looks > like a > >> mistake. > >> > >> Lastly, the logic of this line looks flawed: > >> > >> if self.cl.is_redundant() and (not self.is_public() or > >> self.getDevice() not in PUBLIC_INTERFACE) > >> > >> As PUBLIC_INTERFACE is limited to eth1, the `not self.is_public()` > >> will be ignored. Public IPs will never be assigned to eth1, so this > >> line evaluates as: > >> > >> > >> if self.cl.is_redundant() and (self.getDevice() not in > >> PUBLIC_INTERFACE) > >> > >> which reduces even further to: > >> > >> if self.cs.is_redundant() and self.is_control() > >> > >> > >> What would need doing > >> --------------------- > >> > >> 1. The keepalived.conf template would need to be changed to run the > >> VRRP instance over eth0. > >> 2. The check_is_up method of the CsIP class should be renamed to > >> 'bring_up_interfaces'. For redundant routers it should ignore IPs > >> that > pass > >> is_public or needs_vrrp. > >> 3. The arpPing method should do nothing if the interface is down. > >> 4. The PUBLIC_INTERFACE constant should be either renamed or > >> dropped altogether. > >> 5. Other things that I haven't considered? > >> > >> > >> I'd really appreciate any feedback on this. It's possible that I've > >> got it all wrong but I'm suspecting not. I just don't want to tread > >> on > anyone's > >> toes by submitting a PR that goes against what appears to be an > >> explicit design decision. > >> > >> > >> Kind regards, > >> > >> Dean Close > >> iCloudHosting.com > >> http://www.icloudhosting.com > >> Tel: 01582 227927 > >> > >> Unit 2, Smallmead Road, Reading RG2 0QS > >> > >> ****************************************************************** > >> The names iCloudHosting and iCloudHosting.com are trading styles of > >> BBS Commerce Ltd which is registered in England and Wales, Company > >> Number 04837714. Please use our trading address above for mail. Our > >> registered office is 5 Theale Lakes Business Park, Moulden Way, > >> Sulhamstead, > Reading, > >> Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 94. > >> > >> This email and any files transmitted with it are confidential and > >> intended solely for the use of the individual or entity to whom > >> they are addressed. If you are not the intended recipient you are > >> not authorised > to > >> and must not disclose, copy, distribute, or retain this message or > >> any > part > >> of it. > >> > >> iCloudHosting accepts no responsibility for information, errors or > >> omissions in this email. > >> ****************************************************************** > >> > >> > > >