Re: [Lsr] Working Group Last Call for "OSPF Strict-Mode for BFD" - draft-ietf-lsr-ospf-bfd-strict-mode-04

Robert Raszuk Mon, 31 Jan 2022 09:41:33 -0800

Les,

BFD does not have a notion to come up in a hidden way, start testing
the link and only after some defined per client protocol or per interface
peer period signal to the client its "UP" state.


BFD holdtime as defined in RFC5880 is to keep BFD sessions and therefore
data probes down (for example to smooth in time control plane churn when
1000s of BFD peers are to all come up in the same time) so it is completely
unrelated to the discussion about BFD clients and link testing before
brining OSPF adj up.

If you want to extend BFD further you are welcome to take it to the BFD WG.
I am cc-ing Greg here as he is one of the best active BFD experts.

But if so this draft should wait till that happens. Alternatively as it has
been suggested we could choose to keep BFD simple and have such timer on
the client side. The behaviour on the client is trivial - Do your client
action (bring IGP adj. up or BGP session up or insert static to the RIB etc
...) if timer X elapsed from BFD session going UP and during that time
there was no transition to DOWN.

Regards,
R.




On Mon, Jan 31, 2022 at 6:31 PM Les Ginsberg (ginsberg) <ginsb...@cisco.com>
wrote:

> Albert –
>
>
>
> We are in full agreement.
>
>
>
> Delays in bringing BFD backup after a previous failure may well be
> warranted in the break-in-middle scenarios.
>
> I am not convinced this needs to be standardized – seems quite appropriate
> as an implementation choice. But if any discussion were to occur in RFCs, I
> think it should be in some BFD document.
>
>
>
> As this draft is focused on OSPF protocol extensions, I don’t think BFD
> dampening needs to be discussed. In any case it should not alter the
> interaction between BFD and protocols. If it takes longer for BFD to come
> up that just means the OSPF adjacency will not come up either – which is
> exactly the behavior that is desired.
>
>
>
>     Les
>
>
>
>
>
> *From:* Albert Fu (BLOOMBERG/ 120 PARK) <af...@bloomberg.net>
> *Sent:* Monday, January 31, 2022 6:50 AM
> *To:* Les Ginsberg (ginsberg) <ginsb...@cisco.com>; ketant.i...@gmail.com;
> rob...@raszuk.net
> *Cc:* Acee Lindem (acee) <a...@cisco.com>;
> draft-ietf-lsr-ospf-bfd-strict-m...@ietf.org; lsr@ietf.org
> *Subject:* RE: [Lsr] Working Group Last Call for "OSPF Strict-Mode for
> BFD" - draft-ietf-lsr-ospf-bfd-strict-mode-04
>
>
>
> Hi Les,
>
>
>
> Your scenario below is indeed something we have encountered in our
> production network in the non-strict scenario, due to "flapping" links,
> where routing protocol could come up before BFD due to "break-in-middle"
> link issue (interface stayed up, so routing protocol remained active).
> Strict mode will address this issue.
>
>
>
> Another point to add is that we do have as a standard on our interfaces to
> safeguard against flapping link by configuring interface
> hold-time/carrier-delay. However, this is only useful in situations where
> the link physically goes down (and fast detection is automatic in most
> implementation).
>
>
>
> Nowadays, it is also common to see the "break-in-middle" failures. we use
> BFD to detect this sort of failure within sub-second. And to dampen this
> sort of break-in-middle failures, we will need to use BFD
> holdtime/dampening.
>
>
>
> Thanks
>
>
>
> Albert
>
>
>
>
>
>
>
> From: ginsb...@cisco.com At: 01/30/22 14:38:37 UTC-5:00
>
> To: rob...@raszuk.net, ketant.i...@gmail.com
> Cc: Albert Fu (BLOOMBERG/ 120 PARK ) <af...@bloomberg.net>, a...@cisco.com,
> draft-ietf-lsr-ospf-bfd-strict-m...@ietf.org, lsr@ietf.org
> Subject: RE: [Lsr] Working Group Last Call for "OSPF Strict-Mode for BFD"
> - draft-ietf-lsr-ospf-bfd-strict-mode-04
>
>
>
> Robert –
>
>
>
> Here is what you said (emphasis added):
>
>
>
> <snip>
>
> But the timer I am suggesting is not related to BFD operation, but to OSPF
> (and/or ISIS). It is not about BFD sessions being UP or DOWN. It is about 
> *allowing
> BFD for more testing (with various parameters (for example increasing test
> packet size in some discrete steps)* before OSPF is happy to bring the
> adj. up.
>
> <end snip>
>
>
>
> Point #1: If you want BFD to do more testing (such as MTU testing) then
> clearly you need extensions to BFD (such as
> https://datatracker.ietf.org/doc/draft-ietf-bfd-large-packets/ )
>
>
>
> Point #2: The existing timers (as Ketan points out are mentioned in
> Section 5) are applied today at the OSPF level precisely because OSPF does
> not currently have strict-mode operation. So in a flapping scenario you
> could see the following behavior:
>
>
>
> a)BFD goes down
>
> b)OSPF goes down in response to BFD
>
> c)OSPF comes back up
>
> d)Link is still unstable – so traffic is being dropped some of the time –
> but perhaps OSPF adjacency stays up (i.e., OSPF hellos get through often
> enough to keep the OSPF adjacency up)
>
>
>
> So some implementations have chosen to insert a delay following “b”. This
> doesn’t guarantee stability, but hopefully makes it less likely. And
> because OSPF today does NOT wait for BFD to come up, the delay has to be
> implemented at the OSPF level.
>
>
>
> Once you have strict mode support, the sequence becomes:
>
>
>
> a)BFD goes down
>
> b)OSPF goes down in response to BFD
>
> c)BFD comes back up
>
> d)OSPF comes back up
>
>
>
> Now, if the concern is that BFD comes back up while the link is still
> unstable, the way to address that is to put a delay either before BFD
> attempts to bring up a new session or a delay after achieving UP state
> before it signals UP to its clients – such as OSPF. This is a better
> solution because all BFD clients benefit from this. Ad if the link is still
> unstable, it is more likely that the BFD session will go down during the
> delay period than it would be for OSPF because the BFD timers are
> significantly more aggressive.
>
> (BTW, this behavior can be done w/o a BFD protocol extension – it is
> purely an implementation choice.)
>
>
>
> From a design perspective, dampening is always best done at the lowest
> layer possible. In most cases, interface layer dampening is best. If that
> is not reliable for some reason, then move one layer up – not two layers up.
>
>
>
>    Les
>
>
>
>
>
> *From:* Robert Raszuk <rob...@raszuk.net>
> *Sent:* Sunday, January 30, 2022 10:05 AM
> *To:* Ketan Talaulikar <ketant.i...@gmail.com>
> *Cc:* Les Ginsberg (ginsberg) <ginsb...@cisco.com>; Acee Lindem (acee) <
> a...@cisco.com>; draft-ietf-lsr-ospf-bfd-strict-m...@ietf.org; Albert Fu <
> af...@bloomberg.net>; lsr <lsr@ietf.org>
> *Subject:* Re: [Lsr] Working Group Last Call for "OSPF Strict-Mode for
> BFD" - draft-ietf-lsr-ospf-bfd-strict-mode-04
>
>
>
> Hi Ketan,
>
>
>
> I would like to point out that the draft discusses the BFD "dampening" or
> "hold-down" mechanism in Sec 5. We are aware of BFD implementations that
> include such mechanisms in a protocol-agnostic manner.
>
>
>
> BFD dampening or hold-time are completely orthogonal to my point. Both
> have nothing to do with it.
>
>
>
> Those timers only fire when BFD goes down. In my example BFD does not go
> down. But we want to bring up the client adj. only after X ms/sec/min etc
> ...of normal BFD operation if no failure is detected during that timer.
>
>
>
> This draft indicates that OSPF adjacency will "advance" in the neighbor
> FSM only after BFD reports UP.
>
>
>
> And that is exactly too soon. In fact if you do that today without waiting
> some time (if you retire the current OSPF timer) you will not help at all
> in the case you are trying to address.
>
>
>
> Reason being that perhaps 200 ms after BFD UP it will go down, but OSPF
> adj. will get already established. It is really pretty simple.
>
>
>
> Thx,
>
> Robert.
>
>
>
> PS. And yes I think ISIS should also get fixed in that respect.
>
>
>

_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] Working Group Last Call for "OSPF Strict-Mode for BFD" - draft-ietf-lsr-ospf-bfd-strict-mode-04

Reply via email to