actually not only this issue, but more
(1) restart vpc router will break the network if there are multiple tiers
in the vpc.
(2) check_heartbeat.sh does not work.

I will create tickets for them next Monday, and create PR on github.

-Wei

2016-04-01 21:14 GMT+02:00 Wei ZHOU <ustcweiz...@gmail.com>:

> Dean,
>
> I just fixed it yesterday.
>
> the commit is
>
> ---
>
> diff --git a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> index 5f63c06..5256d03 100755
> --- a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> +++ b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py
> @@ -27,7 +27,7 @@ from CsRoute import CsRoute
>  from CsRule import CsRule
>
>  VRRP_TYPES = ['guest']
> -PUBLIC_INTERFACE = ['eth1']
> +VPC_PUBLIC_INTERFACE = ['eth1']
>
>  class CsAddress(CsDataBag):
>
> @@ -323,7 +323,7 @@ class CsIP:
>                  # If redundant only bring up public interfaces that are
> not eth1.
>                  # Reason: private gateways are public interfaces.
>                  # master.py and keepalived will deal with eth1 public
> interface.
> -                if self.cl.is_redundant() and (not self.is_public() or
> self.getDevice() not in PUBLIC_INTERFACE):
> +                if self.cl.is_redundant() and (not self.is_public() or
> (self.config.is_vpc() and self.getDevice() not in VPC_PUBLIC_INTERFACE)):
>                      CsHelper.execute(cmd2)
>                  # if not redundant bring everything up
>                  if not self.cl.is_redundant():
>
> ---
>
> - Wei
>
> 2016-04-01 20:13 GMT+02:00 Dean Close <dean.cl...@icloudhosting.com>:
>
>> Hi guys,
>>
>> I had been investigating a possible bug with the way interfaces are
>> managed on virtual routers. The public interfaces are being brought up on
>> backup routers and (because they boot second) they arp the IPs away from
>> the master. I'd been examining an idea for a fix but whilst doing so I
>> found that the system appears to be designed to bring up these interfaces.
>>
>> I suspect that a few things need to be reworked - but the changes
>> necessary go so far against what has been implemented that I wanted to open
>> this up before doing the work.
>>
>> Hopefully if I go through my findings you guys can help me see what I
>> might be getting wrong.
>>
>> The following was correct for pre-4.6 redundant routers:
>>
>>   1. Both routers get configured with IP addresses, routes and iptables
>> rules.
>>   2. Public interfaces are initially set as DOWN.
>>   3. Keepalived runs a VRRP instance on the private interface (eth0) to
>> negotiate MASTER/BACKUP roles.
>>   4. Keepalived manages the virtual IP on eth0 used as the public gateway
>> for the guest VMs.
>>   5. Keepalived uses a master notify script to bring up the public
>> interfaces.
>>
>> The above was true for pre-4.6 routers. Now, however, things appear to
>> work differently:
>>
>>   1. Both routers get configured as before.
>>   2. All interfaces apart from eth1 (the Hypervisor-link interface) are
>> set as UP.
>>   3. Keepalived runs a VRRP instance on the first public interface (eth2)
>> to negotiate MASTER/BACKUP roles.
>>   4. Keepalived manages the virtual IP as before.
>>   5. Keepalived uses a master notify script to bring up the public
>> interfaces (unnecessary)
>>   6. Keepalived uses a backup notify script to bring down the public
>> interfaces (unused)
>>
>> This is unexpected for the following reasons:
>>
>>   1. The keepalived notify script brings the public interfaces down when
>> transitioning to BACKUP - so how can we expect to run a VRRP instance over
>> eth2?
>>   2. If interfaces are down when transitioning to BACKUP, why are they
>> not expected to be down to begin with? (Before the router has become MASTER)
>>   3. Why are we running a VRRP instance over an interface with an IP that
>> will clash with another host on the network?
>>
>> The following method from the CsIP class in
>> /opt/cloud/bin/cs/CsAddress.py confuses matters futher:
>>
>>     def check_is_up(self):
>>          """ Ensure device is up """
>>          cmd = "ip link show %s | grep 'state DOWN'" % self.getDevice()
>>          for i in CsHelper.execute(cmd):
>>              if " DOWN " in i:
>>                  cmd2 = "ip link set %s up" % self.getDevice()
>>                  # If redundant only bring up public interfaces that are
>> not eth1.
>>                  # Reason: private gateways are public interfaces.
>>                  # master.py and keepalived will deal with eth1 public
>> interface.
>>                  if self.cl.is_redundant() and (not self.is_public() or
>> self.getDevice() not in PUBLIC_INTERFACE):
>>                      CsHelper.execute(cmd2)
>>                  # if not redundant bring everything up
>>                  if not self.cl.is_redundant():
>>                      CsHelper.execute(cmd2)
>>
>> The comments refer to eth1 as a public interface when this is the link to
>> the hypervisor. Indeed, PUBLIC_INTERFACE is defined on line 31 as ['eth1'].
>> But keepalived and master.py don't influence eth1 at all. This looks like a
>> mistake.
>>
>> Lastly, the logic of this line looks flawed:
>>
>>   if self.cl.is_redundant() and (not self.is_public() or self.getDevice()
>> not in PUBLIC_INTERFACE)
>>
>> As PUBLIC_INTERFACE is limited to eth1, the `not self.is_public()` will
>> be ignored. Public IPs will never be assigned to eth1, so this line
>> evaluates as:
>>
>>
>>   if self.cl.is_redundant() and (self.getDevice() not in PUBLIC_INTERFACE)
>>
>> which reduces even further to:
>>
>>   if self.cs.is_redundant() and self.is_control()
>>
>>
>> What would need doing
>> ---------------------
>>
>>   1. The keepalived.conf template would need to be changed to run the
>> VRRP instance over eth0.
>>   2. The check_is_up method of the CsIP class should be renamed to
>> 'bring_up_interfaces'. For redundant routers it should ignore IPs that pass
>> is_public or needs_vrrp.
>>   3. The arpPing method should do nothing if the interface is down.
>>   4. The PUBLIC_INTERFACE constant should be either renamed or dropped
>> altogether.
>>   5. Other things that I haven't considered?
>>
>>
>> I'd really appreciate any feedback on this. It's possible that I've got
>> it all wrong but I'm suspecting not. I just don't want to tread on anyone's
>> toes by submitting a PR that goes against what appears to be an explicit
>> design decision.
>>
>>
>> Kind regards,
>>
>> Dean Close
>> iCloudHosting.com
>> http://www.icloudhosting.com
>> Tel: 01582 227927
>>
>> Unit 2, Smallmead Road, Reading RG2 0QS
>>
>> ******************************************************************
>> The names iCloudHosting and iCloudHosting.com are trading styles of BBS
>> Commerce Ltd which is registered in England and Wales, Company Number
>> 04837714. Please use our trading address above for mail. Our registered
>> office is 5 Theale Lakes Business Park, Moulden Way, Sulhamstead, Reading,
>> Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 94.
>>
>> This email and any files transmitted with it are confidential and
>> intended solely for the use of the individual or entity to whom they are
>> addressed. If you are not the intended recipient you are not authorised to
>> and must not disclose, copy, distribute, or retain this message or any part
>> of it.
>>
>> iCloudHosting accepts no responsibility for information, errors or
>> omissions in this email.
>> ******************************************************************
>>
>>
>

Reply via email to