Re: [DISCUSS] VR upgrade downtime reduction

Daan Hoogland Tue, 06 Feb 2018 05:51:40 -0800

looking forward to your blog(s), Remi. sound like you guys are still having
fun.


PS did you review your PR, i submitted for you ;) ?

On Tue, Feb 6, 2018 at 2:47 PM, Remi Bergsma <rberg...@schubergphilis.com>
wrote:

> Hi Daan,
>
> In my opinion the biggest issue is the fact that there are a lot of
> different code paths: VPC versus non-VPC, VPC versus redundant-VPC, etc.
> That's why you cannot simply switch from a single VPC to a redundant VPC
> for example.
>
> For SBP, we mitigated that in Cosmic by converting all non-VPCs to a VPC
> with a single tier and made sure all features are supported. Next we merged
> the single and redundant VPC code paths. The idea here is that redundancy
> or not should only be a difference in the number of routers. Code should be
> the same. A single router, is also "master" but there just is no "backup".
>
> That simplifies things A LOT, as keepalived is now the master of the whole
> thing. No more assigning ip addresses in Python, but leave that to
> keepalived instead. Lots of code deleted. Easier to maintain, way more
> stable. We just released Cosmic 6 that has this feature and are now rolling
> it out in production. Looking good so far. This change unlocks a lot of
> possibilities, like live upgrading from a single VPC to a redundant one
> (and back). In the end, if the redundant VPC is rock solid, you most likely
> don't even want single VPCs any more. But that will come.
>
> As I said, we're rolling this out as we speak. In a few weeks when
> everything is upgraded I can share what we learned and how well it works.
> CloudStack could use a similar approach.
>
> Kind Regards,
> Remi
>
>
>
> On 05/02/2018, 16:44, "Daan Hoogland" <daan.hoogl...@gmail.com> wrote:
>
>     H devs,
>
>     I have recently (re-)submitted two PRs, one by Wei [1] and one by Remi
> [2],
>     that reduce downtime for redundant routers and redundant VPCs
> respectively.
>     (please review those)
>     Now from customers we hear that they also want to reduce downtime for
>     regular VRs so as we discussed this we came to two possible solutions
> that
>     we want to implement one of:
>
>     1. start and configure a new router before destroying the old one and
> then
>     as a last minute action stop the old one.
>     2. make all routers start up redundancy services but for regular
> routers
>     start only one until an upgrade is required at which time a new, second
>     router can be started before killing the old one.
>
>     obviously both solutions have their merits, so I want to have your
> input
>     to make the broadest supported implementation.
>     -1 means there will be an overlap or a small delay and interruption of
>     service.
>     +1 It can be argued, "they got what they payed for".
>     -2 means a overhead in memory usage by the router by the extra services
>     running on it.
>     +2 the number of router-varieties will be further reduced.
>
>     -1&-2 We have to deal with potentially large upgrade steps from way
> before
>     the cloudstack era even and might be stuck to 1 because of that,
> needing to
>     hack around it. Any dealing with older VRs, pre 4.5 and especially pre
> 4.0
>     will be hard.
>
>     I am not cross posting though this might be one of these occasions
> where it
>     is appropriate to include users@. Just my puristic inhibitions.
>
>     Of course I have preferences but can you share your thoughts, please?
>     
>     And don't forget to review Wei's [1] and Remi's [2] work please.
>
>     [1] https://github.com/apache/cloudstack/pull/2435
>     [2] https://github.com/apache/cloudstack/pull/2436
>
>     --
>     Daan
>
>
>


-- 
Daan

Re: [DISCUSS] VR upgrade downtime reduction

Reply via email to