Greg,

Thank you so much for your input to this discussion. As you can see it is
not easy to convince some folks :)

I just want to clarify one thing in respect to using multihop BFD between
RR and PE. There is nothing about data plane path detection with that
suggestion. Basically BFD is used here as a better ping. No more no less.

Few points:

#1 - Yes, if my control plane RR-PE network fails and the normal data plane
to PE is still up I will have a false positive. Solution: Use more than one
RR.

#2 - Yes BFD process liveness or not on PE does not guarantee service
liveness of the PE - that is always true as BFD does not check real service
data plane processing anyway

#3 - If my network to PE fails but RR-PE works fine PE will be considered
alive. Network down detection is not the goal for discussing PUA/PULSE.

Cheers,
R.





On Tue, Nov 30, 2021 at 5:08 AM Greg Mirsky <gregimir...@gmail.com> wrote:

> Hi Aijun,
> thank you for clarifying your goal. I have missed asking another question:
>
> What is the required failure detection time?
>
> For example, a 10 ms detection guarantee is required for local protection.
> And that results in a 3.3 ms interval between the fault detection packets
> (e.g., CCM or BFD). As I understand it, IGP is likely to rely on single-hop
> BFD detection. Hence, 10 ms before PE's neighbor discovers the failure.
> Then the IGP processes will start acting. Thus, I don't see how IGP can
> guarantee anything less than 10 ms. Would you agree?
>
> Regards,
> Greg
>
> On Mon, Nov 29, 2021 at 7:38 PM Aijun Wang <wangai...@tsinghua.org.cn>
> wrote:
>
>> Hi, Greg:
>>
>>
>>
>> I understand that BFD can get the guaranteed failure detection time than
>> other protocol that depends on the size of the network.
>>
>> What we want to emphasize is that the balance of deployment/operation
>> overhead and the efficiency of the proposed solutions.
>>
>> For your questions, I think we can still get the millisecond failure
>> detection time via the IGP itself(Far faster than the BGP hello timer for
>> BGP use case; and also benefit for the tunnel services that has no hello
>> timer).
>>
>> The actual time should certainly be verified later in simulation
>> environment or in real network deployment.
>>
>>
>>
>> Best Regards
>>
>>
>>
>> Aijun Wang
>>
>> China Telecom
>>
>>
>>
>> *From:* Greg Mirsky <gregimir...@gmail.com>
>> *Sent:* Tuesday, November 30, 2021 11:11 AM
>> *To:* Aijun Wang <wangai...@tsinghua.org.cn>
>> *Cc:* lsr <lsr@ietf.org>; Gyan Mishra <hayabusa...@gmail.com>; Robert
>> Raszuk <rob...@raszuk.net>
>> *Subject:* Re: [Lsr] BFD aspects
>>
>>
>>
>> Hi Aijun,
>>
>> what is the guaranteed failure detection time for the IGP-based solution?
>>
>>
>>
>> Regards,
>>
>> Greg
>>
>>
>>
>> On Mon, Nov 29, 2021 at 7:07 PM Aijun Wang <wangai...@tsinghua.org.cn>
>> wrote:
>>
>> Hi, Greg:
>>
>>
>>
>> Even the BFD auto-configuration extensions has been standardized and
>> implemented, won’t the network be filled with the detect packets,
>> instead of the user packets?
>>
>> For PUA/PULSE solution, the mentioned LSA will only be emerged when the
>> node status change from “UP” to “DOWN”, but the BFD packet will be sent
>> continuously when these PEs are active.
>>
>> Which one is efficient?
>>
>>
>>
>> Certainly, we will consider the massive failure situations, even it will
>> occur in very rare circumstances.
>>
>>
>>
>> Best Regards
>>
>>
>>
>> Aijun Wang
>>
>> China Telecom
>>
>>
>>
>> *From:* Greg Mirsky <gregimir...@gmail.com>
>> *Sent:* Tuesday, November 30, 2021 10:47 AM
>> *To:* Aijun Wang <wangai...@tsinghua.org.cn>
>> *Cc:* lsr <lsr@ietf.org>; Gyan Mishra <hayabusa...@gmail.com>; Robert
>> Raszuk <rob...@raszuk.net>
>> *Subject:* Re: [Lsr] BFD aspects
>>
>>
>>
>> Hi Aijun,
>>
>> thank you for confirming that it is not the conclusion one can arrive
>> based on my discussion with Robert. Secondly, the problem you describe, I
>> wouldn't characterize as a scaling issue with using multi-hop BFD
>> monitoring path continuity in the underlay network. In my opinion, it is an
>> operational overhead that can be addressed by an intelligent management
>> plane or a few extensions in the control plane that is setting an overlay.
>> Since the management plane is usually a proprietary solution, I invite
>> anyone interested in working on BFD auto-configuration extensions in the
>> control plane. I much appreciate references to the use cases that can
>> benefit from such extensions.
>>
>>
>>
>> Regards,
>>
>> Greg
>>
>>
>>
>> On Mon, Nov 29, 2021 at 6:26 PM Aijun Wang <wangai...@tsinghua.org.cn>
>> wrote:
>>
>> Hi, Greg:
>>
>>
>>
>> Firstly, regardless of which methods to be used for the multihop BFD
>> approach, it is certainly the configuration overhead if you image there are
>> 10,000 PEs as Tony often raised as one example.
>>
>> Shouldn’t you configure each pair of them to detect the PE-PE connection?
>>
>> It is obvious not scalable.
>>
>>
>>
>>
>>
>> Best Regards
>>
>>
>>
>> Aijun Wang
>>
>> China Telecom
>>
>>
>>
>> *From:* Greg Mirsky <gregimir...@gmail.com>
>> *Sent:* Tuesday, November 30, 2021 10:18 AM
>> *To:* Aijun Wang <wangai...@tsinghua.org.cn>
>> *Cc:* Gyan Mishra <hayabusa...@gmail.com>; Robert Raszuk <
>> rob...@raszuk.net>; lsr <lsr@ietf.org>
>> *Subject:* Re: [Lsr] BFD aspects
>>
>>
>>
>> Hi Aijun,
>>
>> could you please elaborate on how you see that this discussion leads to
>> the "BFD based detection for the mentioned problem is not [...]
>> scalable(among PEs)" conclusion? I hope that there's nothing I've said or
>> suggested lead you to this conclusion. Personally, I believe that BFD-based
>> PE-PE is the best technical solution. I understand that an operator may be
>> dissatisfied with the additional configuration of the BFD session. As
>> noted, I believe that can be addressed in the management plane or minor
>> extensions in the control plane (BGP or not). If a particular
>> implementation (or a combination of the implementation and HW) has a
>> scaling challenge with multi-hop BFD, then that could be not enough
>> sufficient technical justification for a somewhat controversial proposal.
>>
>>
>>
>> Regards,
>>
>> Greg
>>
>>
>>
>> On Mon, Nov 29, 2021 at 5:17 PM Aijun Wang <wangai...@tsinghua.org.cn>
>> wrote:
>>
>> From the discussion, I think we can get the conclusion that BFD based
>> detection for the mentioned problem is not reliable (between PE/RR) and
>> scalable(among PEs).
>>
>> Then also the BGP based solution.
>>
>>
>>
>> So let’s focus how to implement it within the IGP?  Thanks Greg’s
>> analysis.
>>
>> And one supplement for Robert’s comments: RR is always not located
>> within the same area as PEs, then can’t know the down of PE nodes
>> immediately when the summary is configured between areas.
>>
>>
>>
>> Best Regards
>>
>>
>>
>> Aijun Wang
>>
>> China Telecom
>>
>>
>>
>> *From:* lsr-boun...@ietf.org <lsr-boun...@ietf.org> *On Behalf Of *Gyan
>> Mishra
>> *Sent:* Tuesday, November 30, 2021 8:44 AM
>> *To:* Robert Raszuk <rob...@raszuk.net>
>> *Cc:* Greg Mirsky <gregimir...@gmail.com>; lsr <lsr@ietf.org>
>> *Subject:* Re: [Lsr] BFD aspects
>>
>>
>>
>>
>>
>> Robert
>>
>>
>>
>> On Mon, Nov 29, 2021 at 7:35 PM Robert Raszuk <rob...@raszuk.net> wrote:
>>
>> Hi Greg,
>>
>>
>>
>> If BFD would have autodiscovery built in, that would indeed be the
>> ultimate solution. Of course folks will worry about scaling and number of
>> BFD sessions to be run PE-PE.
>>
>> GIM>> I sense that it is not "BFD autodiscovery" but an advertisement of
>> BFD multi-hop system readiness to the particular PE. That, as I think of
>> it, can be done in a control or management plane.
>>
>>
>>
>> Agreed.
>>
>>
>>
>> But if BFD between all PEs would be an option why RR to PE in the local
>> area would not be a viable solution ?
>>
>>
>>
>> GIM>>Because, in the case of PE-PE, BFD control packets will be
>> fate-sharing with data packets. But the path between RR and PE might not be
>> used for carrying data packets at all.
>>
>>
>>
>> 100%. But that was accounted for. Reason being that you have at least
>> two RRs in an area. The point of BFD was to use detect that PE went down.
>>
>>
>>
>> Gyan> What Greg is alluding is a very good point to consider is that the
>> RR in many cases in operator networks sit in the “control plane” path
>> which is separate from the data plane path.  So the E2E forwarding plane
>> path between the PEs, the RR has no knowledge as is it sits outside the
>> forwarding plane path.  That being said the PE to RR path is disjoint from
>> the PE-PE path so from the PE-RR  RR POV may think the PE is up or down
>> thus the false positive or negative. That would be the case regardless of
>> how many RRs are deployed.
>>
>>
>>
>> You are absolutely right that it may report RR disconnect from the
>> network while PE is up and data plane from remote PEs can reach it. That is
>> why we have more than one RR.
>>
>>
>>
>> As far as fate sharing PE-PE BFD with real user data - I think it is not
>> always the case. But this is completely separate discussion :)
>>
>>
>>
>> Also please keep in mind that PE going down can be learned by RRs by
>> listening to the IGP. No BFD needed.
>>
>>
>>
>> Both would be multihop, both would be subject to all transit failures etc
>> ...
>>
>> GIM>> I think that there's a difference between the impact a path failure
>> has on the data traffic. In the case of monitoring PE-PE path in the
>> underlay and using the same encapsulation as data traffic is representative
>> of the data experience. A failure of the PE-RR path, in my understanding,
>> may be not representative at all. BFD session between RR and PE may fail
>> while PE is absolutely functional from the service PoV.
>>
>>
>>
>> Please keep in mind that this entire discussion is not about data plane
>> failure end to end :)  Yes, it's pretty sad. This entire debate  is to
>> indicate domain wide that the IGP component on a PE went down.
>>
>>
>>
>> No one considers data plane liveness and even as you observed data plane
>> encapsulation congruence. Clearly this is not a true OAM discussion.
>>
>>
>>
>> On the other hand, PE might be disconnected from the service while the
>> BFD session to RR is in the Up state.
>>
>>
>>
>> Not likely if you keep in mind that to trigger any remote action such
>> failure would have to happen to all RRs.
>>
>>
>>
>> Thx a lot,
>> R.
>>
>>
>>
>> _______________________________________________
>> Lsr mailing list
>> Lsr@ietf.org
>> https://www.ietf.org/mailman/listinfo/lsr
>>
>> --
>>
>> <http://www.verizon.com/>
>>
>> *Gyan Mishra*
>>
>> *Network Solutions Architect *
>>
>> *Email gyan.s.mis...@verizon.com <gyan.s.mis...@verizon.com>*
>>
>> *M 301 502-1347*
>>
>>
>>
>> _______________________________________________
>> Lsr mailing list
>> Lsr@ietf.org
>> https://www.ietf.org/mailman/listinfo/lsr
>>
>>
_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr

Reply via email to