[Bug 1744062] Re: L3 HA: multiple agents are active at the same time

2018-07-03 Thread Corey Bryant
It appears the following commits are required to fix this for
keepalived:

commit e90a633c34fbe6ebbb891aa98bf29ce579b8b45c
Author: Quentin Armitage 
Date:   Fri Dec 15 21:14:24 2017 +

Fix removing left-over addresses if keepalived aborts

Issue #718 reported that if keepalived terminates abnormally when
it has vrrp instances in master state, it doesn't remove the
left-over VIPs and eVIPs when it restarts. This is despite
commit f4c10426c saying that it resolved this problem.

It turns out that commit f4c10426c did resolve the problem for VIPs
or eVIPs, although it did resolve the issue for iptables and ipset
configuration.

This commit now really resolves the problem, and residual VIPs and
eVIPs are removed at startup.

Signed-off-by: Quentin Armitage 


commit f4c10426ca0a7c3392422c22079f1b71e7d4ebe9
Author: Quentin Armitage 
Date:   Sun Mar 6 09:53:27 2016 +

Remove ip addresses left over from previous failure

If keepalived terminates unexpectedly, for any instances for which
it was master, it leaves ip addresses configured on the interfaces.
When keepalived restarts, if it starts in backup mode, the addresses
must be removed. In addition, any iptables/ipsets entries added for
!accept_mode must also be removed, in order to avoid multiple entries
being created in iptables.

This commit removes any addresses and iptables/ipsets configuration
for any interfaces that exist when iptables starts up. If keepalived
shut down cleanly, that will only be for non-vmac interfaces, but if
it terminated unexpectedly, it can also be for any left-over vmacs.

Signed-off-by: Quentin Armitage 


f4c10426ca0a7c3392422c22079f1b71e7d4ebe9 is already included in:
* keepalived 1:1.3.9-1build1 (bionic/queens, cosmic/rocky)
* keepalived 1:1.3.2-1build1 (artful/pike)
* keepalived 1:1.3.2-1 (zesty/ocata) [1]

[1] zesty is EOL -
https://launchpad.net/ubuntu/+source/keepalived/1:1.3.2-1

f4c10426ca0a7c3392422c22079f1b71e7d4ebe9 is not included in:
* keepalived 1:1.2.19-1ubuntu0.2 (xenial/mitaka)

The backport of f4c10426ca0a7c3392422c22079f1b71e7d4ebe9 to xenial does
not look trivial. I'd prefer to backport keepalived 1:1.3.2-* to the
pike/ocata cloud archives.

-- 
You received this bug notification because you are a member of Ubuntu
Server, which is subscribed to keepalived in Ubuntu.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1744062] Re: L3 HA: multiple agents are active at the same time

2018-07-03 Thread Corey Bryant
It appears the following commits are required to fix this for
keepalived:

commit e90a633c34fbe6ebbb891aa98bf29ce579b8b45c
Author: Quentin Armitage 
Date:   Fri Dec 15 21:14:24 2017 +

Fix removing left-over addresses if keepalived aborts

Issue #718 reported that if keepalived terminates abnormally when
it has vrrp instances in master state, it doesn't remove the
left-over VIPs and eVIPs when it restarts. This is despite
commit f4c10426c saying that it resolved this problem.

It turns out that commit f4c10426c did resolve the problem for VIPs
or eVIPs, although it did resolve the issue for iptables and ipset
configuration.

This commit now really resolves the problem, and residual VIPs and
eVIPs are removed at startup.

Signed-off-by: Quentin Armitage 


commit f4c10426ca0a7c3392422c22079f1b71e7d4ebe9
Author: Quentin Armitage 
Date:   Sun Mar 6 09:53:27 2016 +

Remove ip addresses left over from previous failure

If keepalived terminates unexpectedly, for any instances for which
it was master, it leaves ip addresses configured on the interfaces.
When keepalived restarts, if it starts in backup mode, the addresses
must be removed. In addition, any iptables/ipsets entries added for
!accept_mode must also be removed, in order to avoid multiple entries
being created in iptables.

This commit removes any addresses and iptables/ipsets configuration
for any interfaces that exist when iptables starts up. If keepalived
shut down cleanly, that will only be for non-vmac interfaces, but if
it terminated unexpectedly, it can also be for any left-over vmacs.

Signed-off-by: Quentin Armitage 


f4c10426ca0a7c3392422c22079f1b71e7d4ebe9 is already included in:
* keepalived 1:1.3.9-1build1 (bionic/queens, cosmic/rocky)
* keepalived 1:1.3.2-1build1 (artful/pike)
* keepalived 1:1.3.2-1 (zesty/ocata) [1]

[1] zesty is EOL -
https://launchpad.net/ubuntu/+source/keepalived/1:1.3.2-1

f4c10426ca0a7c3392422c22079f1b71e7d4ebe9 is not included in:
* keepalived 1:1.2.19-1ubuntu0.2 (xenial/mitaka)

The backport of f4c10426ca0a7c3392422c22079f1b71e7d4ebe9 to xenial does
not look trivial. I'd prefer to backport keepalived 1:1.3.2-* to the
pike/ocata cloud archives.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744062] Re: L3 HA: multiple agents are active at the same time

2018-07-03 Thread Corey Bryant
As reported by Xav in https://bugs.launchpad.net/ubuntu/+bug/1731595:

"Comment for the folks that are noticing this as 'fix released' but
still affected - see
https://github.com/acassen/keepalived/commit/e90a633c34fbe6ebbb891aa98bf29ce579b8b45c
for the rest of this fix, we need keepalived to be at least 1.4.0 in
order to have this commit."

I just checked and the patch Xav referenced can be backported fairly
cleanly to at least keepalived 1:1.2.19-1 (xenial/mitaka) and above.

** Also affects: keepalived (Ubuntu)
   Importance: Undecided
   Status: New

** No longer affects: keepalived (Ubuntu Artful)

** Changed in: keepalived (Ubuntu)
   Importance: Undecided => High

** Changed in: keepalived (Ubuntu)
   Status: New => Triaged

** Changed in: keepalived (Ubuntu Xenial)
   Importance: Undecided => High

** Changed in: keepalived (Ubuntu Xenial)
   Status: New => Triaged

** Changed in: keepalived (Ubuntu Bionic)
   Importance: Undecided => High

** Changed in: keepalived (Ubuntu Bionic)
   Status: New => Triaged

** No longer affects: cloud-archive/newton

** No longer affects: neutron (Ubuntu Artful)

-- 
You received this bug notification because you are a member of Ubuntu
Server, which is subscribed to keepalived in Ubuntu.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1744062] Re: L3 HA: multiple agents are active at the same time

2018-07-03 Thread Corey Bryant
As reported by Xav in https://bugs.launchpad.net/ubuntu/+bug/1731595:

"Comment for the folks that are noticing this as 'fix released' but
still affected - see
https://github.com/acassen/keepalived/commit/e90a633c34fbe6ebbb891aa98bf29ce579b8b45c
for the rest of this fix, we need keepalived to be at least 1.4.0 in
order to have this commit."

I just checked and the patch Xav referenced can be backported fairly
cleanly to at least keepalived 1:1.2.19-1 (xenial/mitaka) and above.

** Also affects: keepalived (Ubuntu)
   Importance: Undecided
   Status: New

** No longer affects: keepalived (Ubuntu Artful)

** Changed in: keepalived (Ubuntu)
   Importance: Undecided => High

** Changed in: keepalived (Ubuntu)
   Status: New => Triaged

** Changed in: keepalived (Ubuntu Xenial)
   Importance: Undecided => High

** Changed in: keepalived (Ubuntu Xenial)
   Status: New => Triaged

** Changed in: keepalived (Ubuntu Bionic)
   Importance: Undecided => High

** Changed in: keepalived (Ubuntu Bionic)
   Status: New => Triaged

** No longer affects: cloud-archive/newton

** No longer affects: neutron (Ubuntu Artful)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744062] Re: L3 HA: multiple agents are active at the same time

2018-05-10 Thread LIU Yulong
VRRP heart beat lose may cause the multiple active router behavior. The
underlying connectivity is key point for this. A monitoring for such
behavior is also necessary.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744062] Re: L3 HA: multiple agents are active at the same time

2018-04-12 Thread Joris S'heeren
Our environment is experiencing the same behavior.

Ocata on 16.04 - around 320 routers, some have 2 agents per router,
others have 3 agents per router.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744062] Re: L3 HA: multiple agents are active at the same time

2018-03-26 Thread George
We experience the same issue although we have a smaller environment with
only around 40 Neutron router running HA (two agents per router) over
three physical controllers running Ocata on Ubuntu 16.04

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744062] Re: L3 HA: multiple agents are active at the same time

2018-03-20 Thread Ryan Beisner
@cgregan I think this is really a situation where the desired
scale/density is at odds with the fundamental design of neutron HA
routers.  It's not something to address in the charms or in the
packaging.  As such, I'd consider this a feature request, against
upstream Neutron, and I don't have an assignee for that.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744062] Re: L3 HA: multiple agents are active at the same time

2018-03-02 Thread Chris Gregan
We need an assigned engineer to meet the requirements of Field High SLA.
Please assign

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744062] Re: L3 HA: multiple agents are active at the same time

2018-02-16 Thread James Page
Just as another thing to consider - the deployment where this is
happening also experienced bug 1749425 which resulted in packet loss;
the networks between network/gateway units is also made via OVS, so if
OVS was dropping packets due to the large number of missing tap devices,
its possible this was also impacting connectivity between keepalived
instances for HA routers, resulting in active/active nasty-ness.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744062] Re: L3 HA: multiple agents are active at the same time

2018-02-14 Thread James Troup
Downgrading to Field High - I think the Critical part is tracked in LP
#1749425

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744062] Re: L3 HA: multiple agents are active at the same time

2018-02-14 Thread James Troup
FYI: Resubscribing field SLA.  It was not raised as critical by Kiko R.;
I raised it and it's still an active ongoing problem on a customer site.
Please do not unsubscribe again without discussion with the correct
people.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744062] Re: L3 HA: multiple agents are active at the same time

2018-02-08 Thread Ryan Beisner
I do think this issue is still of high importance to OpenStack's overall
scale and resilience story.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744062] Re: L3 HA: multiple agents are active at the same time

2018-02-08 Thread Ryan Beisner
FYI:  Unsubscribing field SLA based on re-triage with Kiko R.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744062] Re: L3 HA: multiple agents are active at the same time

2018-01-24 Thread Hua Zhang
I have some thoughts in my mind for this problem as below:

1, First of all, we need to figure out why it will appear multiple
ACTIVE master HA nodes in theory ?

Assume the master is dead (at this time, it's status in DB is still
ACTIVE), then slave will be selected to new master. After the old master
has recovered, the L444 this.enable_keepalived() [4] will be invoked to
spawn keepalived instance, so multiple ACTIVE master HA nodes occur.
(Related patch - https://review.openstack.org/#/c/357458/)

So the key to solving this problem is to reset the status of all HA
ports into DOWN at a certain code path, so the patch
https://review.openstack.org/#/c/470905/ is used to address this point.
But this patch sets the status=DOWN at this code path
'fetch_and_sync_all_routers -> get_router_ids' which will lead to a
bigger problem when the load is large.

2, Why setting status=DOWN in the code path 'fetch_and_sync_all_routers
-> get_router_ids' will lead to a bigger problem when the load is large
?

If l3-agent is not active via heartbeat check, l3-agent will be set
status=AGENT_REVIVED [1], then l3-agent will be triggered to do a full
sync (self.fullsync=True) [2] so that the code logic
'periodic_sync_routers_task -> fetch_and_sync_all_routers' will be
called again and again [3].

All these operations will aggravate the load for l2-agent, l2-agent, DB
and MQ etc. Conversely, large load also will aggravate AGENT_REVIVED
case.

So it's a vicious circle, the patch
https://review.openstack.org/#/c/522792/ is used to address this point.
It uses the code path '__init__ -> get_service_plugin_list ->
_update_ha_network_port_status' instead of the code path
'periodic_sync_routers_task -> fetch_and_sync_all_routers'.

3, We have known, the small heartbeat value can cause AGENT_REVIVED then
aggravate the load, the high load can cause other problems, like some
phenomenons Xav mentioned before, I pasted them as below as well:

- We later found that openvswitch had run out of filehandles, see LP: #1737866
- Resolving that allowed ovs to create a ton more filehandles.

This is just an example, there may be other circumstances. All those let
us mistake the fix doesn't fix the problem.

The high load can also cause other similar problem, for another example:

a, can cause the process neutron-keepalived-state-change to exit due to
term singal [5] (https://paste.ubuntu.com/26450042/),  neutron-
keepalived-state-change is used to monitor vrrp's VIP change then update
the ha_router's status to neutron-server [6]. so that l3-agent will not
be able to update the status for ha ports, thus we can see multiple
ACTIVE case or multiple STANDBY case or others.

b, can cause the RPC message sent from here [6] can not be handled well.


So for this problem, my concrete opinion is:

a, bump up heartbeat option (agent_down_time)

b, we need this patch: https://review.openstack.org/#/c/522641/

c, Ensure that other components (like MQ, DB etc) have no performance
problems


[1] 
https://github.com/openstack/neutron/blob/stable/ocata/neutron/db/agents_db.py#L354
[2] 
https://github.com/openstack/neutron/blob/stable/ocata/neutron/agent/l3/agent.py#L736
[3] 
https://github.com/openstack/neutron/blob/stable/ocata/neutron/agent/l3/agent.py#L583
[4] 
https://github.com/openstack/neutron/blob/stable/ocata/neutron/agent/l3/ha_router.py#L444
[5] 
https://github.com/openstack/neutron/blob/stable/ocata/neutron/agent/l3/keepalived_state_change.py#L134
[6] 
https://github.com/openstack/neutron/blob/stable/ocata/neutron/agent/l3/keepalived_state_change.py#L71

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744062] Re: L3 HA: multiple agents are active at the same time

2018-01-23 Thread John George
This bug falls under the Canonical Cloud Engineering service-level
agreement (SLA) process, as a field critical bug.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744062] Re: L3 HA: multiple agents are active at the same time

2018-01-18 Thread Alvaro UrĂ­a
** Tags added: canonical-bootstack

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs