Re: [Lsr] BFD aspects

Greg Mirsky Mon, 29 Nov 2021 20:08:33 -0800

Hi Aijun,
thank you for clarifying your goal. I have missed asking another question:


What is the required failure detection time?

For example, a 10 ms detection guarantee is required for local protection.
And that results in a 3.3 ms interval between the fault detection packets
(e.g., CCM or BFD). As I understand it, IGP is likely to rely on single-hop
BFD detection. Hence, 10 ms before PE's neighbor discovers the failure.
Then the IGP processes will start acting. Thus, I don't see how IGP can
guarantee anything less than 10 ms. Would you agree?

Regards,
Greg

On Mon, Nov 29, 2021 at 7:38 PM Aijun Wang <[email protected]>
wrote:

> Hi, Greg:
>
>
>
> I understand that BFD can get the guaranteed failure detection time than
> other protocol that depends on the size of the network.
>
> What we want to emphasize is that the balance of deployment/operation
> overhead and the efficiency of the proposed solutions.
>
> For your questions, I think we can still get the millisecond failure
> detection time via the IGP itself(Far faster than the BGP hello timer for
> BGP use case; and also benefit for the tunnel services that has no hello
> timer).
>
> The actual time should certainly be verified later in simulation
> environment or in real network deployment.
>
>
>
> Best Regards
>
>
>
> Aijun Wang
>
> China Telecom
>
>
>
> *From:* Greg Mirsky <[email protected]>
> *Sent:* Tuesday, November 30, 2021 11:11 AM
> *To:* Aijun Wang <[email protected]>
> *Cc:* lsr <[email protected]>; Gyan Mishra <[email protected]>; Robert
> Raszuk <[email protected]>
> *Subject:* Re: [Lsr] BFD aspects
>
>
>
> Hi Aijun,
>
> what is the guaranteed failure detection time for the IGP-based solution?
>
>
>
> Regards,
>
> Greg
>
>
>
> On Mon, Nov 29, 2021 at 7:07 PM Aijun Wang <[email protected]>
> wrote:
>
> Hi, Greg:
>
>
>
> Even the BFD auto-configuration extensions has been standardized and
> implemented, won’t the network be filled with the detect packets, instead
> of the user packets?
>
> For PUA/PULSE solution, the mentioned LSA will only be emerged when the
> node status change from “UP” to “DOWN”, but the BFD packet will be sent
> continuously when these PEs are active.
>
> Which one is efficient?
>
>
>
> Certainly, we will consider the massive failure situations, even it will
> occur in very rare circumstances.
>
>
>
> Best Regards
>
>
>
> Aijun Wang
>
> China Telecom
>
>
>
> *From:* Greg Mirsky <[email protected]>
> *Sent:* Tuesday, November 30, 2021 10:47 AM
> *To:* Aijun Wang <[email protected]>
> *Cc:* lsr <[email protected]>; Gyan Mishra <[email protected]>; Robert
> Raszuk <[email protected]>
> *Subject:* Re: [Lsr] BFD aspects
>
>
>
> Hi Aijun,
>
> thank you for confirming that it is not the conclusion one can arrive
> based on my discussion with Robert. Secondly, the problem you describe, I
> wouldn't characterize as a scaling issue with using multi-hop BFD
> monitoring path continuity in the underlay network. In my opinion, it is an
> operational overhead that can be addressed by an intelligent management
> plane or a few extensions in the control plane that is setting an overlay.
> Since the management plane is usually a proprietary solution, I invite
> anyone interested in working on BFD auto-configuration extensions in the
> control plane. I much appreciate references to the use cases that can
> benefit from such extensions.
>
>
>
> Regards,
>
> Greg
>
>
>
> On Mon, Nov 29, 2021 at 6:26 PM Aijun Wang <[email protected]>
> wrote:
>
> Hi, Greg:
>
>
>
> Firstly, regardless of which methods to be used for the multihop BFD
> approach, it is certainly the configuration overhead if you image there are
> 10,000 PEs as Tony often raised as one example.
>
> Shouldn’t you configure each pair of them to detect the PE-PE connection?
>
> It is obvious not scalable.
>
>
>
>
>
> Best Regards
>
>
>
> Aijun Wang
>
> China Telecom
>
>
>
> *From:* Greg Mirsky <[email protected]>
> *Sent:* Tuesday, November 30, 2021 10:18 AM
> *To:* Aijun Wang <[email protected]>
> *Cc:* Gyan Mishra <[email protected]>; Robert Raszuk <
> [email protected]>; lsr <[email protected]>
> *Subject:* Re: [Lsr] BFD aspects
>
>
>
> Hi Aijun,
>
> could you please elaborate on how you see that this discussion leads to
> the "BFD based detection for the mentioned problem is not [...]
> scalable(among PEs)" conclusion? I hope that there's nothing I've said or
> suggested lead you to this conclusion. Personally, I believe that BFD-based
> PE-PE is the best technical solution. I understand that an operator may be
> dissatisfied with the additional configuration of the BFD session. As
> noted, I believe that can be addressed in the management plane or minor
> extensions in the control plane (BGP or not). If a particular
> implementation (or a combination of the implementation and HW) has a
> scaling challenge with multi-hop BFD, then that could be not enough
> sufficient technical justification for a somewhat controversial proposal.
>
>
>
> Regards,
>
> Greg
>
>
>
> On Mon, Nov 29, 2021 at 5:17 PM Aijun Wang <[email protected]>
> wrote:
>
> From the discussion, I think we can get the conclusion that BFD based
> detection for the mentioned problem is not reliable (between PE/RR) and
> scalable(among PEs).
>
> Then also the BGP based solution.
>
>
>
> So let’s focus how to implement it within the IGP?  Thanks Greg’s
> analysis.
>
> And one supplement for Robert’s comments: RR is always not located within
> the same area as PEs, then can’t know the down of PE nodes immediately
> when the summary is configured between areas.
>
>
>
> Best Regards
>
>
>
> Aijun Wang
>
> China Telecom
>
>
>
> *From:* [email protected] <[email protected]> *On Behalf Of *Gyan
> Mishra
> *Sent:* Tuesday, November 30, 2021 8:44 AM
> *To:* Robert Raszuk <[email protected]>
> *Cc:* Greg Mirsky <[email protected]>; lsr <[email protected]>
> *Subject:* Re: [Lsr] BFD aspects
>
>
>
>
>
> Robert
>
>
>
> On Mon, Nov 29, 2021 at 7:35 PM Robert Raszuk <[email protected]> wrote:
>
> Hi Greg,
>
>
>
> If BFD would have autodiscovery built in, that would indeed be the
> ultimate solution. Of course folks will worry about scaling and number of
> BFD sessions to be run PE-PE.
>
> GIM>> I sense that it is not "BFD autodiscovery" but an advertisement of
> BFD multi-hop system readiness to the particular PE. That, as I think of
> it, can be done in a control or management plane.
>
>
>
> Agreed.
>
>
>
> But if BFD between all PEs would be an option why RR to PE in the local
> area would not be a viable solution ?
>
>
>
> GIM>>Because, in the case of PE-PE, BFD control packets will be
> fate-sharing with data packets. But the path between RR and PE might not be
> used for carrying data packets at all.
>
>
>
> 100%. But that was accounted for. Reason being that you have at least
> two RRs in an area. The point of BFD was to use detect that PE went down.
>
>
>
> Gyan> What Greg is alluding is a very good point to consider is that the
> RR in many cases in operator networks sit in the “control plane” path
> which is separate from the data plane path.  So the E2E forwarding plane
> path between the PEs, the RR has no knowledge as is it sits outside the
> forwarding plane path.  That being said the PE to RR path is disjoint from
> the PE-PE path so from the PE-RR  RR POV may think the PE is up or down
> thus the false positive or negative. That would be the case regardless of
> how many RRs are deployed.
>
>
>
> You are absolutely right that it may report RR disconnect from the network
> while PE is up and data plane from remote PEs can reach it. That is why we
> have more than one RR.
>
>
>
> As far as fate sharing PE-PE BFD with real user data - I think it is not
> always the case. But this is completely separate discussion :)
>
>
>
> Also please keep in mind that PE going down can be learned by RRs by
> listening to the IGP. No BFD needed.
>
>
>
> Both would be multihop, both would be subject to all transit failures etc
> ...
>
> GIM>> I think that there's a difference between the impact a path failure
> has on the data traffic. In the case of monitoring PE-PE path in the
> underlay and using the same encapsulation as data traffic is representative
> of the data experience. A failure of the PE-RR path, in my understanding,
> may be not representative at all. BFD session between RR and PE may fail
> while PE is absolutely functional from the service PoV.
>
>
>
> Please keep in mind that this entire discussion is not about data plane
> failure end to end :)  Yes, it's pretty sad. This entire debate  is to
> indicate domain wide that the IGP component on a PE went down.
>
>
>
> No one considers data plane liveness and even as you observed data plane
> encapsulation congruence. Clearly this is not a true OAM discussion.
>
>
>
> On the other hand, PE might be disconnected from the service while the BFD
> session to RR is in the Up state.
>
>
>
> Not likely if you keep in mind that to trigger any remote action such
> failure would have to happen to all RRs.
>
>
>
> Thx a lot,
> R.
>
>
>
> _______________________________________________
> Lsr mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/lsr
>
> --
>
> <http://www.verizon.com/>
>
> *Gyan Mishra*
>
> *Network Solutions Architect *
>
> *Email [email protected] <[email protected]>*
>
> *M 301 502-1347*
>
>
>
> _______________________________________________
> Lsr mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/lsr
>
>

_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] BFD aspects

Reply via email to