Re: HA redundant virtual router

Daan Hoogland Sat, 07 Sep 2013 12:09:48 -0700

H Sheng, thanks. Will raad it soon and comment or propose
additions/alterations


mobile biligual spell checker used
Op 6 sep. 2013 00:27 schreef "Sheng Yang" <sh...@yasker.org> het volgende:

> Here is the doc.
>
>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Redundant+Virtual+Router+Functional+Spec
>
> It's not extremely detail, but describe today's design generally.
>
> --Sheng
>
>
> On Thu, Aug 29, 2013 at 8:17 AM, Daan Hoogland <daan.hoogl...@gmail.com>wrote:
>
>> ok,
>>
>> let's postpone the discussion till you are at least halve done. We
>> will of course continue to deliberate on what we need internally.
>>
>> Daan
>>
>> On Thu, Aug 29, 2013 at 5:08 PM, Sheng Yang <sh...@yasker.org> wrote:
>> > Hi Daan,
>> >
>> > As I said, I am writing a design doc to describe the current redundant
>> > router policy, to help understanding redundant router. Current it
>> doesn't
>> > support VPC, so how to implement it in VPC is still open to discuss.
>> >
>> > --Sheng
>> >
>> >
>> > On Thu, Aug 29, 2013 at 4:26 AM, Daan Hoogland <daan.hoogl...@gmail.com
>> >
>> > wrote:
>> >>
>> >> Sheng,
>> >>
>> >> just to make sure; You are going to write this document? I see Roeland
>> >> understood your mail like this.
>> >>
>> >> When you do, I'd like you to keep in mind that we also want redundant
>> >> routers within a VPC to ensure ACS upgrades are more seamless for
>> >> customer application groups and - dtap streets. If you need any help
>> >> on writing such a doc, let me know.
>> >>
>> >> kind regards,
>> >> Daan
>> >>
>> >> On Thu, Aug 29, 2013 at 1:13 PM, Roeland Kuipers
>> >> <rkuip...@schubergphilis.com> wrote:
>> >> > Hi Sheng,
>> >> >
>> >> > Thanks for the info. Looking forward to the design doc, I trust this
>> >> > will make things clearer.
>> >> > In the meantime will be doing some research and thinking too, to see
>> how
>> >> > we can improve things to also have HA on the RvR in a safe way.
>> >> > We will share this once ready.
>> >> >
>> >> > Thanks,
>> >> > Roeland
>> >> >
>> >> >
>> >> > From: Sheng Yang [mailto:sh...@yasker.org]
>> >> > Sent: donderdag 29 augustus 2013 0:19
>> >> > To: <dev@cloudstack.apache.org>
>> >> > Cc: int-cloud; Daan Hoogland
>> >> > Subject: Re: HA redundant virtual router
>> >> >
>> >> > Hi Roeland,
>> >> >
>> >> > I would write a design doc to explain how redundant router works
>> >> > currently. For example, for the point 2, we have to force BACKUP
>> become
>> >> > MASTER because:
>> >> >
>> >> > 1. CS cannot communicate with MASTER at the time
>> >> > 2. CS can communicate with BACKUP.
>> >> > 3. Rule has to be programmed immediately.
>> >> > 4. In case old MASTER come back, it should yield to the VR with
>> updated
>> >> > rule, rather than preempt the updated VR.
>> >> >
>> >> > In this case, CS need to communicate with RvR to program the new
>> rule,
>> >> > thus it need to intervene the RvR to ensure that if there is only
>> one VR got
>> >> > the rule, it should become MASTER.
>> >> >
>> >> > Still, I would write a doc later to try to cover every concern of RvR
>> >> > design.
>> >> >
>> >> > --Sheng
>> >> >
>> >> > On Tue, Aug 27, 2013 at 3:40 AM, Roeland Kuipers
>> >> > <rkuip...@schubergphilis.com<mailto:rkuip...@schubergphilis.com>>
>> wrote:
>> >> > Hi Sheng,
>> >> >
>> >> > Thanks for your reply. I'll see if we can replay this scenario.
>> >> >
>> >> > With respect to point 1: a good principal IMHO.
>> >> >
>> >> > Point 2: Why do we force a keepalived node to become master and not
>> wait
>> >> > for keepalived to become master? This way there is less reason to
>> intervene
>> >> > and less risk of multiple masters? As we have seen this behavior
>> with RvR
>> >> > without HA in the past. The downside that updates to rules do not
>> function
>> >> > until backup becomes master. But maybe this is wise anyways since
>> there is
>> >> > something wrong. This conflicts a bit with point 2 as we do
>> intervene here.
>> >> >
>> >> > Point 3: In my opinion keepalived is solid enough to leave this
>> >> > responsibility with keepalived and that CS just should check the
>> state and
>> >> > not fiddle with priorities to force masters. Because there is
>> obviously a
>> >> > reason why BACKUP refuses to become master.
>> >> > I think we should let keepalived prevent multiple master as is
>> designed
>> >> > to prevent this. Or do I miss something here?
>> >> > Actually in the scenario you described, with a functioning guest
>> >> > network, keepalived should be able to handle this situation if we
>> make sure
>> >> > all routers have different prios.
>> >> >
>> >> > I still have the opinion HA and RvR are different mechanisms.
>> >> >
>> >> > So what do you think is necessary to have the possibility of HA icw
>> RvR?
>> >> > We have a clear business requirement to have this implement on CS.
>> And we
>> >> > have Developers willing to create these changes to make this
>> possible.
>> >> > We also like to see RvR on VPC's and are also willing to contribute
>> this
>> >> > functionality.
>> >> >
>> >> > Thanks for your feedback!
>> >> >
>> >> > Cheers,
>> >> > Roeland
>> >> >
>> >> > -----Original Message-----
>> >> > From: Sheng Yang [mailto:sh...@yasker.org<mailto:sh...@yasker.org>]
>> >> > Sent: vrijdag 23 augustus 2013 23:25
>> >> > To: <dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>>
>> >> > Subject: Re: HA redundant virtual router
>> >> >
>> >> > Hi Roeland,
>> >> >
>> >> > Thank you for your testing!
>> >> >
>> >> > Power off is not an concern right now, because at that time the VM
>> would
>> >> > disappear anyway.
>> >> >
>> >> > Our concern is more about if VM is still alive but we cannot detect
>> it
>> >> > for a while. For example, a network glitch happened, CS lost
>> connection to
>> >> > the host temporarily(control network), but the guest network is still
>> >> > working.
>> >> > HA would start another VR, which would possible result in 3 routers
>> in
>> >> > the guest network(at least for a moment). Many of the policy focus on
>> >> > dealing these intermediate status. Also if you plug off the network
>> cable of
>> >> > one host many things should happen...
>> >> >
>> >> >
>> >> > In RvR we want to make sure:
>> >> > 1. The status are self-governed, no need for CS to intervene.
>> >> > 2. MASTER would always get the latest rules. That means, if we cannot
>> >> > communicate with MASTER, we would turn to BACKUP and program the
>> rule on it
>> >> > and make it MASTER - even we cannot communicate with MASTER at this
>> time.
>> >> > And BACKUP should able to become MASTER if we request. This is
>> achieved
>> >> > by using a script to bump up the priority of BACKUP.
>> >> > 3. Trying best to prevent the dual-MASTER situation. So we would
>> program
>> >> > different priority for VRs and the MASTER/BACKUP status completely
>> depends
>> >> > on priority.
>> >> >
>> >> > And if you take RvR as an alternative to VM's HA mechanism., it's not
>> >> > that counter intuitive in fact.
>> >> >
>> >> > --Sheng
>> >> >
>> >> >
>> >> > On Fri, Aug 23, 2013 at 1:56 AM, Roeland Kuipers <
>> >> > rkuip...@schubergphilis.com<mailto:rkuip...@schubergphilis.com>>
>> wrote:
>> >> >
>> >> >> Hi Sheng,
>> >> >>
>> >> >> So far our testing showed no big problems. I've marked a redundant
>> set
>> >> >> of routers to be ha_enabled by setting ha_enabled bit in the
>> >> >> vm_instance table. (This is our workaround ATM) We tested HA icw RvR
>> >> >> in the scenarios ,shutdown / force power off VM. In these scenarios
>> HA
>> >> >> worked a treat and did restore the redundant pair as it should. And
>> >> >> keepalived nicely negotiated MASTER & BACKUP.
>> >> >> These are obviously basic tests, but we are happy to do some more
>> >> >> testing.
>> >> >>
>> >> >> I understand your concerns and am totally in favour of the KISS
>> >> >> principle.
>> >> >> What could be the scenario to end up with 3 routers?
>> >> >> Why is the situation complex to deal with? These are separate
>> >> >> mechanisms.
>> >> >> HA just making sure the router is up and alive. And keepalived
>> >> >> negotatiating MASTER-BACUP states according to keepalived
>> >> >> configuration, unless there a 3 routers with conflicting configs.
>> But
>> >> >> so far I do not understand the scenario where we could end up with 3
>> >> >> routers, so I cannot judge end/or test this.
>> >> >>
>> >> >> We like to see the hardcoded denial of HA in a redundant router
>> setup
>> >> >> go for several reasons:
>> >> >> 1. It's counter intuitive - we configured an HA service offering on
>> >> >> purpose for the RvR's. And found out by accident that it was not
>> >> >> enabled at all.
>> >> >> 2. CS could implement a default offering without HA for this setup
>> (to
>> >> >> keep it simple by default and keep currently forced behaviour), but
>> if
>> >> >> users, like us, deliberately like to have HA, users can create a
>> >> >> custom offering with HA enabled
>> >> >>
>> >> >> This way it's configurable, doesn't change default behavior and is
>> >> >> more intuitive.
>> >> >>
>> >> >> Thanks & Cheers,
>> >> >> Roeland
>> >> >>
>> >> >>
>> >> >>
>> >> >> -----Original Message-----
>> >> >> From: Sheng Yang [mailto:sh...@yasker.org<mailto:sh...@yasker.org>]
>> >> >> Sent: vrijdag 23 augustus 2013 3:03
>> >> >> To: <dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>>
>> >> >> Subject: Re: HA redundant virtual router
>> >> >>
>> >> >> It's a design choice, the only reason is it would be a very complex
>> >> >> situation to deal with. In fact the redundant router itself's policy
>> >> >> has already been very complex...
>> >> >>
>> >> >> We didn't look into details at the time of implementing redundant
>> >> >> router, but there are lots of concerns e.g. a network glitch may
>> >> >> result in 3 routers running in the network and potentially two of
>> them
>> >> >> are in MASTER state.
>> >> >>
>> >> >> Of course discussion is welcome. We just want to keep it as simple
>> as
>> >> >> possible at the time.
>> >> >>
>> >> >> --Sheng
>> >> >>
>> >> >>
>> >> >> On Thu, Aug 22, 2013 at 3:31 AM, Daan Hoogland <
>> >> >> dhoogl...@schubergphilis.com<mailto:dhoogl...@schubergphilis.com>
>> >> >> > wrote:
>> >> >>
>> >> >> > LS,
>> >> >> >
>> >> >> > Schuberg Philis guarantees 100% functional uptime for their
>> >> >> > customers.
>> >> >> > Infrastructure is of course part of this promise and the easier
>> >> >> > factor to provide strong levels of resiliency. For this reason we
>> >> >> > want to make use of redundant virtual routers together with HA
>> >> >> > functionality.
>> >> >> >
>> >> >> > We see HA and redundant routers as to different methods to provide
>> >> >> > higher levels of uptime.
>> >> >> >
>> >> >> >
>> >> >> > 1.      The redundant router setup takes care of seamless failover
>> >> >> without
>> >> >> > lengthy hick-ups in the case of a single router failure.
>> >> >> >
>> >> >> > 2.      HA takes care of restarting a failed VM or router.
>> Restoring
>> >> >> > connectivity in the case of single router or restoring 2n
>> resiliency
>> >> >> > in the case of a redundant router setup.
>> >> >> >
>> >> >> > The combination of these two methods will help us to meet our 100%
>> >> >> > promise; .We need to restore 2N redundancy ASAP in the case of
>> >> >> > single component failure e.g. a router. With these two methods
>> >> >> > combined the system is more autonomous and doesn't need human
>> >> >> > intervention to restore redundancy.
>> >> >> >
>> >> >> > In the current situation we need to send a page to an on call
>> >> >> > engineer to restore redundancy asap, because of the tight SLA's.
>> >> >> > While if we could use HA icw redundant routers. The on-call guy
>> can
>> >> >> > enjoy his sleep and will be a more happy guy :) The present code
>> >> >> > forces the HA offering to off on redundant routers which seems
>> odd.
>> >> >> >
>> >> >> > So my question is: Why is it forced to off; Is there a technical
>> >> >> > restraint or is this a design choice we can discuss and maybe
>> revise?
>> >> >> >
>> >> >> > Cheers,
>> >> >> >
>> >> >> >
>> >> >>
>> >> >
>> >
>> >
>>
>
>

Re: HA redundant virtual router

Reply via email to