H Sheng, thanks. Will raad it soon and comment or propose additions/alterations
mobile biligual spell checker used Op 6 sep. 2013 00:27 schreef "Sheng Yang" <sh...@yasker.org> het volgende: > Here is the doc. > > > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Redundant+Virtual+Router+Functional+Spec > > It's not extremely detail, but describe today's design generally. > > --Sheng > > > On Thu, Aug 29, 2013 at 8:17 AM, Daan Hoogland <daan.hoogl...@gmail.com>wrote: > >> ok, >> >> let's postpone the discussion till you are at least halve done. We >> will of course continue to deliberate on what we need internally. >> >> Daan >> >> On Thu, Aug 29, 2013 at 5:08 PM, Sheng Yang <sh...@yasker.org> wrote: >> > Hi Daan, >> > >> > As I said, I am writing a design doc to describe the current redundant >> > router policy, to help understanding redundant router. Current it >> doesn't >> > support VPC, so how to implement it in VPC is still open to discuss. >> > >> > --Sheng >> > >> > >> > On Thu, Aug 29, 2013 at 4:26 AM, Daan Hoogland <daan.hoogl...@gmail.com >> > >> > wrote: >> >> >> >> Sheng, >> >> >> >> just to make sure; You are going to write this document? I see Roeland >> >> understood your mail like this. >> >> >> >> When you do, I'd like you to keep in mind that we also want redundant >> >> routers within a VPC to ensure ACS upgrades are more seamless for >> >> customer application groups and - dtap streets. If you need any help >> >> on writing such a doc, let me know. >> >> >> >> kind regards, >> >> Daan >> >> >> >> On Thu, Aug 29, 2013 at 1:13 PM, Roeland Kuipers >> >> <rkuip...@schubergphilis.com> wrote: >> >> > Hi Sheng, >> >> > >> >> > Thanks for the info. Looking forward to the design doc, I trust this >> >> > will make things clearer. >> >> > In the meantime will be doing some research and thinking too, to see >> how >> >> > we can improve things to also have HA on the RvR in a safe way. >> >> > We will share this once ready. >> >> > >> >> > Thanks, >> >> > Roeland >> >> > >> >> > >> >> > From: Sheng Yang [mailto:sh...@yasker.org] >> >> > Sent: donderdag 29 augustus 2013 0:19 >> >> > To: <dev@cloudstack.apache.org> >> >> > Cc: int-cloud; Daan Hoogland >> >> > Subject: Re: HA redundant virtual router >> >> > >> >> > Hi Roeland, >> >> > >> >> > I would write a design doc to explain how redundant router works >> >> > currently. For example, for the point 2, we have to force BACKUP >> become >> >> > MASTER because: >> >> > >> >> > 1. CS cannot communicate with MASTER at the time >> >> > 2. CS can communicate with BACKUP. >> >> > 3. Rule has to be programmed immediately. >> >> > 4. In case old MASTER come back, it should yield to the VR with >> updated >> >> > rule, rather than preempt the updated VR. >> >> > >> >> > In this case, CS need to communicate with RvR to program the new >> rule, >> >> > thus it need to intervene the RvR to ensure that if there is only >> one VR got >> >> > the rule, it should become MASTER. >> >> > >> >> > Still, I would write a doc later to try to cover every concern of RvR >> >> > design. >> >> > >> >> > --Sheng >> >> > >> >> > On Tue, Aug 27, 2013 at 3:40 AM, Roeland Kuipers >> >> > <rkuip...@schubergphilis.com<mailto:rkuip...@schubergphilis.com>> >> wrote: >> >> > Hi Sheng, >> >> > >> >> > Thanks for your reply. I'll see if we can replay this scenario. >> >> > >> >> > With respect to point 1: a good principal IMHO. >> >> > >> >> > Point 2: Why do we force a keepalived node to become master and not >> wait >> >> > for keepalived to become master? This way there is less reason to >> intervene >> >> > and less risk of multiple masters? As we have seen this behavior >> with RvR >> >> > without HA in the past. The downside that updates to rules do not >> function >> >> > until backup becomes master. But maybe this is wise anyways since >> there is >> >> > something wrong. This conflicts a bit with point 2 as we do >> intervene here. >> >> > >> >> > Point 3: In my opinion keepalived is solid enough to leave this >> >> > responsibility with keepalived and that CS just should check the >> state and >> >> > not fiddle with priorities to force masters. Because there is >> obviously a >> >> > reason why BACKUP refuses to become master. >> >> > I think we should let keepalived prevent multiple master as is >> designed >> >> > to prevent this. Or do I miss something here? >> >> > Actually in the scenario you described, with a functioning guest >> >> > network, keepalived should be able to handle this situation if we >> make sure >> >> > all routers have different prios. >> >> > >> >> > I still have the opinion HA and RvR are different mechanisms. >> >> > >> >> > So what do you think is necessary to have the possibility of HA icw >> RvR? >> >> > We have a clear business requirement to have this implement on CS. >> And we >> >> > have Developers willing to create these changes to make this >> possible. >> >> > We also like to see RvR on VPC's and are also willing to contribute >> this >> >> > functionality. >> >> > >> >> > Thanks for your feedback! >> >> > >> >> > Cheers, >> >> > Roeland >> >> > >> >> > -----Original Message----- >> >> > From: Sheng Yang [mailto:sh...@yasker.org<mailto:sh...@yasker.org>] >> >> > Sent: vrijdag 23 augustus 2013 23:25 >> >> > To: <dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>> >> >> > Subject: Re: HA redundant virtual router >> >> > >> >> > Hi Roeland, >> >> > >> >> > Thank you for your testing! >> >> > >> >> > Power off is not an concern right now, because at that time the VM >> would >> >> > disappear anyway. >> >> > >> >> > Our concern is more about if VM is still alive but we cannot detect >> it >> >> > for a while. For example, a network glitch happened, CS lost >> connection to >> >> > the host temporarily(control network), but the guest network is still >> >> > working. >> >> > HA would start another VR, which would possible result in 3 routers >> in >> >> > the guest network(at least for a moment). Many of the policy focus on >> >> > dealing these intermediate status. Also if you plug off the network >> cable of >> >> > one host many things should happen... >> >> > >> >> > >> >> > In RvR we want to make sure: >> >> > 1. The status are self-governed, no need for CS to intervene. >> >> > 2. MASTER would always get the latest rules. That means, if we cannot >> >> > communicate with MASTER, we would turn to BACKUP and program the >> rule on it >> >> > and make it MASTER - even we cannot communicate with MASTER at this >> time. >> >> > And BACKUP should able to become MASTER if we request. This is >> achieved >> >> > by using a script to bump up the priority of BACKUP. >> >> > 3. Trying best to prevent the dual-MASTER situation. So we would >> program >> >> > different priority for VRs and the MASTER/BACKUP status completely >> depends >> >> > on priority. >> >> > >> >> > And if you take RvR as an alternative to VM's HA mechanism., it's not >> >> > that counter intuitive in fact. >> >> > >> >> > --Sheng >> >> > >> >> > >> >> > On Fri, Aug 23, 2013 at 1:56 AM, Roeland Kuipers < >> >> > rkuip...@schubergphilis.com<mailto:rkuip...@schubergphilis.com>> >> wrote: >> >> > >> >> >> Hi Sheng, >> >> >> >> >> >> So far our testing showed no big problems. I've marked a redundant >> set >> >> >> of routers to be ha_enabled by setting ha_enabled bit in the >> >> >> vm_instance table. (This is our workaround ATM) We tested HA icw RvR >> >> >> in the scenarios ,shutdown / force power off VM. In these scenarios >> HA >> >> >> worked a treat and did restore the redundant pair as it should. And >> >> >> keepalived nicely negotiated MASTER & BACKUP. >> >> >> These are obviously basic tests, but we are happy to do some more >> >> >> testing. >> >> >> >> >> >> I understand your concerns and am totally in favour of the KISS >> >> >> principle. >> >> >> What could be the scenario to end up with 3 routers? >> >> >> Why is the situation complex to deal with? These are separate >> >> >> mechanisms. >> >> >> HA just making sure the router is up and alive. And keepalived >> >> >> negotatiating MASTER-BACUP states according to keepalived >> >> >> configuration, unless there a 3 routers with conflicting configs. >> But >> >> >> so far I do not understand the scenario where we could end up with 3 >> >> >> routers, so I cannot judge end/or test this. >> >> >> >> >> >> We like to see the hardcoded denial of HA in a redundant router >> setup >> >> >> go for several reasons: >> >> >> 1. It's counter intuitive - we configured an HA service offering on >> >> >> purpose for the RvR's. And found out by accident that it was not >> >> >> enabled at all. >> >> >> 2. CS could implement a default offering without HA for this setup >> (to >> >> >> keep it simple by default and keep currently forced behaviour), but >> if >> >> >> users, like us, deliberately like to have HA, users can create a >> >> >> custom offering with HA enabled >> >> >> >> >> >> This way it's configurable, doesn't change default behavior and is >> >> >> more intuitive. >> >> >> >> >> >> Thanks & Cheers, >> >> >> Roeland >> >> >> >> >> >> >> >> >> >> >> >> -----Original Message----- >> >> >> From: Sheng Yang [mailto:sh...@yasker.org<mailto:sh...@yasker.org>] >> >> >> Sent: vrijdag 23 augustus 2013 3:03 >> >> >> To: <dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>> >> >> >> Subject: Re: HA redundant virtual router >> >> >> >> >> >> It's a design choice, the only reason is it would be a very complex >> >> >> situation to deal with. In fact the redundant router itself's policy >> >> >> has already been very complex... >> >> >> >> >> >> We didn't look into details at the time of implementing redundant >> >> >> router, but there are lots of concerns e.g. a network glitch may >> >> >> result in 3 routers running in the network and potentially two of >> them >> >> >> are in MASTER state. >> >> >> >> >> >> Of course discussion is welcome. We just want to keep it as simple >> as >> >> >> possible at the time. >> >> >> >> >> >> --Sheng >> >> >> >> >> >> >> >> >> On Thu, Aug 22, 2013 at 3:31 AM, Daan Hoogland < >> >> >> dhoogl...@schubergphilis.com<mailto:dhoogl...@schubergphilis.com> >> >> >> > wrote: >> >> >> >> >> >> > LS, >> >> >> > >> >> >> > Schuberg Philis guarantees 100% functional uptime for their >> >> >> > customers. >> >> >> > Infrastructure is of course part of this promise and the easier >> >> >> > factor to provide strong levels of resiliency. For this reason we >> >> >> > want to make use of redundant virtual routers together with HA >> >> >> > functionality. >> >> >> > >> >> >> > We see HA and redundant routers as to different methods to provide >> >> >> > higher levels of uptime. >> >> >> > >> >> >> > >> >> >> > 1. The redundant router setup takes care of seamless failover >> >> >> without >> >> >> > lengthy hick-ups in the case of a single router failure. >> >> >> > >> >> >> > 2. HA takes care of restarting a failed VM or router. >> Restoring >> >> >> > connectivity in the case of single router or restoring 2n >> resiliency >> >> >> > in the case of a redundant router setup. >> >> >> > >> >> >> > The combination of these two methods will help us to meet our 100% >> >> >> > promise; .We need to restore 2N redundancy ASAP in the case of >> >> >> > single component failure e.g. a router. With these two methods >> >> >> > combined the system is more autonomous and doesn't need human >> >> >> > intervention to restore redundancy. >> >> >> > >> >> >> > In the current situation we need to send a page to an on call >> >> >> > engineer to restore redundancy asap, because of the tight SLA's. >> >> >> > While if we could use HA icw redundant routers. The on-call guy >> can >> >> >> > enjoy his sleep and will be a more happy guy :) The present code >> >> >> > forces the HA offering to off on redundant routers which seems >> odd. >> >> >> > >> >> >> > So my question is: Why is it forced to off; Is there a technical >> >> >> > restraint or is this a design choice we can discuss and maybe >> revise? >> >> >> > >> >> >> > Cheers, >> >> >> > >> >> >> > >> >> >> >> >> > >> > >> > >> > >