Robert –

The paragraph you quote below has to do with BGP behavior in the event “BFD 
session does not transition to the Up state”.
There is no disagreement about what the protocol (BGP or OSPF) should do in 
this case. The point of strict-mode is to “wait-for-BFD”.

You, however, are trying to introduce some additional requirements. To this end 
you said:

“What I find missing in the draft is a mutually (between OSPF peers) timer 
fired after BFD session is up which in OSPF could hold on allowing BFD to do 
some more testing before declaring adj to be established. I think just bringing 
OSPF adj immediately after the BFD session is up is not a good thing.”

So apparently you want BFD to signal UP – but have the protocol do nothing 
until BFD completes some additional testing. What then was the point of BFD 
signaling UP to OSPF? And since you want the additional testing to be done by 
BFD, what new signal should BFD send to OSPF when this is done?
The point of BFD sending UP to its clients is to indicate that BFD thinks the 
link has been verified from the BFD perspective. I do not see the point of 
sending two such signals. If you think current BFD testing is inadequate please 
ask for extensions to BFD (in the BFD WG).

You also said:

“BFD is a great tool to tell you if the end to end path is UP or DOWN. It was 
not designed to give you any characteristics or metrics for the path quality.”

I agree. But if you are now proposing that protocol adjacencies should not come 
up until certain link quality metrics are met (e.g., link loss, delay) – you 
are moving into an area that is completely out of scope of this draft.
I won’t dig deeper into what could be a very lengthy discussion. If you really 
want to pursue this idea, I suggest you write a new draft.

   Les

From: Robert Raszuk <rob...@raszuk.net>
Sent: Monday, January 31, 2022 6:59 AM
To: Albert Fu <af...@bloomberg.net>; Les Ginsberg (ginsberg) 
<ginsb...@cisco.com>; Ketan Talaulikar <ketant.i...@gmail.com>
Cc: Acee Lindem (acee) <a...@cisco.com>; 
draft-ietf-lsr-ospf-bfd-strict-m...@ietf.org; lsr <lsr@ietf.org>
Subject: Re: [Lsr] Working Group Last Call for "OSPF Strict-Mode for BFD" - 
draft-ietf-lsr-ospf-bfd-strict-mode-04

Les & Ketan

Nowadays, it is also common to see the "break-in-middle" failures. we use BFD 
to detect this sort of failure within sub-second. And to dampen this sort of 
break-in-middle failures, we will need to use BFD holdtime/dampening.

Another data point to the above and this discussion which Albert is co-author 
of.

Ref: https://datatracker.ietf.org/doc/html/draft-ietf-idr-bgp-bfd-strict-mode

Please see the below paragraph which clearly says BGP BFD Hold time:

   If the BFD session does not transition to the Up state, and the
   HoldTimer has been negotiated to a non-zero value, the BGP FSM will
   close the session appropriately.  If the HoldTimer has been
   negotiated to a zero value, the session should be closed after a time
   of X.  This time X is referred as "BGP BFD Hold time".  The proposed
   default BGP BFD Hold time value is 30 seconds.  The BGP BFD Hold time
   value is configurable.

To me it is clear that BGP BFD Hold time is on the client side and here affects 
BGP FSM.

Thx,
Robert.







From: ginsb...@cisco.com<mailto:ginsb...@cisco.com> At: 01/30/22 14:38:37 
UTC-5:00
To: rob...@raszuk.net<mailto:rob...@raszuk.net>, 
ketant.i...@gmail.com<mailto:ketant.i...@gmail.com>
Cc: Albert Fu (BLOOMBERG/ 120 PARK ) <mailto:af...@bloomberg.net> , 
a...@cisco.com<mailto:a...@cisco.com>, 
draft-ietf-lsr-ospf-bfd-strict-m...@ietf.org<mailto:draft-ietf-lsr-ospf-bfd-strict-m...@ietf.org>,
 lsr@ietf.org<mailto:lsr@ietf.org>
Subject: RE: [Lsr] Working Group Last Call for "OSPF Strict-Mode for BFD" - 
draft-ietf-lsr-ospf-bfd-strict-mode-04

Robert –

Here is what you said (emphasis added):

<snip>
But the timer I am suggesting is not related to BFD operation, but to OSPF 
(and/or ISIS). It is not about BFD sessions being UP or DOWN. It is about 
allowing BFD for more testing (with various parameters (for example increasing 
test packet size in some discrete steps) before OSPF is happy to bring the adj. 
up.
<end snip>

Point #1: If you want BFD to do more testing (such as MTU testing) then clearly 
you need extensions to BFD (such as 
https://datatracker.ietf.org/doc/draft-ietf-bfd-large-packets/ )

Point #2: The existing timers (as Ketan points out are mentioned in Section 5) 
are applied today at the OSPF level precisely because OSPF does not currently 
have strict-mode operation. So in a flapping scenario you could see the 
following behavior:

a)BFD goes down
b)OSPF goes down in response to BFD
c)OSPF comes back up
d)Link is still unstable – so traffic is being dropped some of the time – but 
perhaps OSPF adjacency stays up (i.e., OSPF hellos get through often enough to 
keep the OSPF adjacency up)

So some implementations have chosen to insert a delay following “b”. This 
doesn’t guarantee stability, but hopefully makes it less likely. And because 
OSPF today does NOT wait for BFD to come up, the delay has to be implemented at 
the OSPF level.

Once you have strict mode support, the sequence becomes:

a)BFD goes down
b)OSPF goes down in response to BFD
c)BFD comes back up
d)OSPF comes back up

Now, if the concern is that BFD comes back up while the link is still unstable, 
the way to address that is to put a delay either before BFD attempts to bring 
up a new session or a delay after achieving UP state before it signals UP to 
its clients – such as OSPF. This is a better solution because all BFD clients 
benefit from this. Ad if the link is still unstable, it is more likely that the 
BFD session will go down during the delay period than it would be for OSPF 
because the BFD timers are significantly more aggressive.
(BTW, this behavior can be done w/o a BFD protocol extension – it is purely an 
implementation choice.)

From a design perspective, dampening is always best done at the lowest layer 
possible. In most cases, interface layer dampening is best. If that is not 
reliable for some reason, then move one layer up – not two layers up.

   Les


From: Robert Raszuk <rob...@raszuk.net<mailto:rob...@raszuk.net>>
Sent: Sunday, January 30, 2022 10:05 AM
To: Ketan Talaulikar <ketant.i...@gmail.com<mailto:ketant.i...@gmail.com>>
Cc: Les Ginsberg (ginsberg) <ginsb...@cisco.com<mailto:ginsb...@cisco.com>>; 
Acee Lindem (acee) <a...@cisco.com<mailto:a...@cisco.com>>; 
draft-ietf-lsr-ospf-bfd-strict-m...@ietf.org<mailto:draft-ietf-lsr-ospf-bfd-strict-m...@ietf.org>;
 Albert Fu <af...@bloomberg.net<mailto:af...@bloomberg.net>>; lsr 
<lsr@ietf.org<mailto:lsr@ietf.org>>
Subject: Re: [Lsr] Working Group Last Call for "OSPF Strict-Mode for BFD" - 
draft-ietf-lsr-ospf-bfd-strict-mode-04

Hi Ketan,

I would like to point out that the draft discusses the BFD "dampening" or 
"hold-down" mechanism in Sec 5. We are aware of BFD implementations that 
include such mechanisms in a protocol-agnostic manner.

BFD dampening or hold-time are completely orthogonal to my point. Both have 
nothing to do with it.

Those timers only fire when BFD goes down. In my example BFD does not go down. 
But we want to bring up the client adj. only after X ms/sec/min etc ...of 
normal BFD operation if no failure is detected during that timer.

This draft indicates that OSPF adjacency will "advance" in the neighbor FSM 
only after BFD reports UP.

And that is exactly too soon. In fact if you do that today without waiting some 
time (if you retire the current OSPF timer) you will not help at all in the 
case you are trying to address.

Reason being that perhaps 200 ms after BFD UP it will go down, but OSPF adj. 
will get already established. It is really pretty simple.

Thx,
Robert.

PS. And yes I think ISIS should also get fixed in that respect.

_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr

Reply via email to