The intention of this proposal is to have a way forward to reducing maintenance
downtime for virtual routers. There are two parts to this proposal;
1. Dealing with legacy routers and replacing them before shutting down.
2. Unifying router embodiments and making use of redundancy mechanisms to
quickly failover from old to new.
Ad .1 It will always be possible that a router is to old and will not be able
to talk to a new version that is to replace it. This might be due to a
keepalived update or replacement or just because it is very old. So though
Unifying the routers and making them redundant enabled will solve a lot of use
cases it will never deal with any conceivable situation, not even in systems
upgraded to a version in which all intended functionality has been implemented.
Dealing with any older router is to work as follows:
1. A check will be done to make sure the old VR is still up.
* If it is not there is no consideration it will be replaced as quickly
as possible. Possible improvements here are the iptables configuration speedup
and other generic optimisations unrelated to the upgrade itself.
* If it is there we need to walk on eggs with provisioning the new one😉
2. A new VR will be instantiated
3. Configuration data will be send but not applied.
4. The interfaces will be added and if need be brought down.
5. All configuration is applied
6. The old VR is killed
7. The interface on the new VR are brought up
Ad .2 This is a long-term goal. At the moment we have five (or debatably six)
different incarnations of the virtual router:
* Basic zone dhcp server
* Shared network ‘router’
* VR
* rVR
* VPC
* rVPC
a first set of steps will be to reduce this to
* shared networks (where a basic zone is an automatic implementation of a
single shared network in a zone)
* VR (which is always redundant enabled but may have only one instance)
* VPC (as above)
and then the next step is to unify VR and VPC as a VR is really only a VPC with
just one network
the final step is then to unify a shared network with a VPC and this one is so
far ahead that I don’t want to make too much statements about it now. We will
have to find the exact implementation hazards that we will face in this step
along the way. I think we are talking at least one year in when we reach this
point.
As Shapeblue we will be starting a short PoC on the first part. We will try to
figure out if the process under .1 is feasible, or that we need to wait
configuring interfaces to the last moment and then do a ‘blind’ start.
[email protected]Â
www.shapeblue.com
53 Chandos Place, Covent Garden, London WC2N 4HSUK
@shapeblue