[PROPOSAL] reducing VR downtime on upgrade

Daan Hoogland Thu, 15 Feb 2018 07:37:28 -0800

The intention of this proposal is to have a way forward to reducing maintenance 
downtime for virtual routers. There are two parts to this proposal;


  1.  Dealing with legacy routers and replacing them before shutting down.
  2.  Unifying router embodiments and making use of redundancy mechanisms to 
quickly failover from old to new.

Ad .1 It will always be possible that a router is to old and will not be able 
to talk to a new version that is to replace it. This might be due to a 
keepalived update or replacement or just because it is very old. So though 
Unifying the routers and making them redundant enabled will solve a lot of use 
cases it will never deal with any conceivable situation, not even in systems 
upgraded to a version in which all intended functionality has been implemented. 
Dealing with any older router is to work as follows:

  1.  A check will be done to make sure the old VR is still up.
     *   If it is not there is no consideration it will be replaced as quickly 
as possible. Possible improvements here are the iptables configuration speedup 
and other generic optimisations unrelated to the upgrade itself.
     *   If it is there we need to walk on eggs with provisioning the new one😉
  2.  A new VR will be instantiated
  3.  Configuration data will be send but not applied.
  4.  The interfaces will be added and if need be brought down.
  5.  All configuration is applied
  6.  The old VR is killed
  7.  The interface on the new VR are brought up

Ad .2 This is a long-term goal. At the moment we have five (or debatably six) 
different incarnations of the virtual router:

  *   Basic zone dhcp server
  *   Shared network ‘router’
  *   VR
  *   rVR
  *   VPC
  *   rVPC
a first set of steps will be to reduce this to

  *   shared networks (where a basic zone is an automatic implementation of a 
single shared network in a zone)
  *   VR (which is always redundant enabled but may have only one instance)
  *   VPC (as above)
and then the next step is to unify VR and VPC as a VR is really only a VPC with 
just one network
the final step is then to unify a shared network with a VPC and this one is so 
far ahead that I don’t want to make too much statements about it now. We will 
have to find the exact implementation hazards that we will face in this step 
along the way. I think we are talking at least one year in when we reach this 
point.

As Shapeblue we will be starting a short PoC on the first part. We will try to 
figure out if the process under .1 is feasible, or that we need to wait 
configuring interfaces to the last moment and then do a ‘blind’ start.

daan.hoogl...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue

[PROPOSAL] reducing VR downtime on upgrade

Reply via email to