We have installed the Ocata -proposed package, however the situation is
this:

- there's 464 routers configured, on 3 Neutron gateway hosts, using l3-ha, and 
each router is scheduled to all 3 hosts.
- we installed the package because were in a situation with a current incident 
with multiple l3 agents active, hoping the package update would solve the 
problem.  One of the gateway hosts was being rebooted at the time to also try 
to do a King Canute and halt the tidal wave of arp.
- We later found that openvswitch had run out of filehandles, see LP: #1737866
- Resolving that allowed ovs to create a ton more filehandles.
- Removing/ re-adding the routers to agents seemed to clean things up, we saw 
some routers with multiple agents active, and some with none active (all 3 
agents 'standby').
- After a few iterations of that, things cleaned up.
- 15-20 mins later, we saw more routers with multiple agents active (ones which 
weren't before), and ran through the same cleanup steps.  At this time, there 
were a large number of keepalived messages in syslog, particularly routers 
becoming MASTER then BACKUP again. (https://pastebin.canonical.com/205361/)
- after another hour or two, we're still clean.

I can't at this stage whether the fix actually fixed the problem or not
- I need to dig further to find out if there could have been some
process running cleanups.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1731595

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1731595/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to