Hi Robert, thank you for your kind words and the discussion. Please find my notes in-lined below and tagged GIM2>>.
Regards, Greg On Tue, Nov 30, 2021 at 1:11 AM Robert Raszuk <rob...@raszuk.net> wrote: > Greg, > > Thank you so much for your input to this discussion. As you can see it is > not easy to convince some folks :) > > I just want to clarify one thing in respect to using multihop BFD between > RR and PE. There is nothing about data plane path detection with that > suggestion. Basically BFD is used here as a better ping. No more no less. > > Few points: > > #1 - Yes, if my control plane RR-PE network fails and the normal data > plane to PE is still up I will have a false positive. Solution: Use more > than one RR. > GIM2>> Now we have to reconcile states reported by RRs. Doable but adds complexity. > > #2 - Yes BFD process liveness or not on PE does not guarantee service > liveness of the PE - that is always true as BFD does not check real service > data plane processing anyway > GIM2>> Agree, > > #3 - If my network to PE fails but RR-PE works fine PE will be considered > alive. Network down detection is not the goal for discussing PUA/PULSE. > GIM2>> Thank you for the clarification. Though I wonder how such separation benefits network operation. > > Cheers, > R. > > > > > > On Tue, Nov 30, 2021 at 5:08 AM Greg Mirsky <gregimir...@gmail.com> wrote: > >> Hi Aijun, >> thank you for clarifying your goal. I have missed asking another question: >> >> What is the required failure detection time? >> >> For example, a 10 ms detection guarantee is required for local >> protection. And that results in a 3.3 ms interval between the fault >> detection packets (e.g., CCM or BFD). As I understand it, IGP is likely to >> rely on single-hop BFD detection. Hence, 10 ms before PE's neighbor >> discovers the failure. Then the IGP processes will start acting. Thus, I >> don't see how IGP can guarantee anything less than 10 ms. Would you agree? >> >> Regards, >> Greg >> >> On Mon, Nov 29, 2021 at 7:38 PM Aijun Wang <wangai...@tsinghua.org.cn> >> wrote: >> >>> Hi, Greg: >>> >>> >>> >>> I understand that BFD can get the guaranteed failure detection time than >>> other protocol that depends on the size of the network. >>> >>> What we want to emphasize is that the balance of deployment/operation >>> overhead and the efficiency of the proposed solutions. >>> >>> For your questions, I think we can still get the millisecond failure >>> detection time via the IGP itself(Far faster than the BGP hello timer for >>> BGP use case; and also benefit for the tunnel services that has no hello >>> timer). >>> >>> The actual time should certainly be verified later in simulation >>> environment or in real network deployment. >>> >>> >>> >>> Best Regards >>> >>> >>> >>> Aijun Wang >>> >>> China Telecom >>> >>> >>> >>> *From:* Greg Mirsky <gregimir...@gmail.com> >>> *Sent:* Tuesday, November 30, 2021 11:11 AM >>> *To:* Aijun Wang <wangai...@tsinghua.org.cn> >>> *Cc:* lsr <lsr@ietf.org>; Gyan Mishra <hayabusa...@gmail.com>; Robert >>> Raszuk <rob...@raszuk.net> >>> *Subject:* Re: [Lsr] BFD aspects >>> >>> >>> >>> Hi Aijun, >>> >>> what is the guaranteed failure detection time for the IGP-based solution? >>> >>> >>> >>> Regards, >>> >>> Greg >>> >>> >>> >>> On Mon, Nov 29, 2021 at 7:07 PM Aijun Wang <wangai...@tsinghua.org.cn> >>> wrote: >>> >>> Hi, Greg: >>> >>> >>> >>> Even the BFD auto-configuration extensions has been standardized and >>> implemented, won’t the network be filled with the detect packets, >>> instead of the user packets? >>> >>> For PUA/PULSE solution, the mentioned LSA will only be emerged when the >>> node status change from “UP” to “DOWN”, but the BFD packet will be sent >>> continuously when these PEs are active. >>> >>> Which one is efficient? >>> >>> >>> >>> Certainly, we will consider the massive failure situations, even it will >>> occur in very rare circumstances. >>> >>> >>> >>> Best Regards >>> >>> >>> >>> Aijun Wang >>> >>> China Telecom >>> >>> >>> >>> *From:* Greg Mirsky <gregimir...@gmail.com> >>> *Sent:* Tuesday, November 30, 2021 10:47 AM >>> *To:* Aijun Wang <wangai...@tsinghua.org.cn> >>> *Cc:* lsr <lsr@ietf.org>; Gyan Mishra <hayabusa...@gmail.com>; Robert >>> Raszuk <rob...@raszuk.net> >>> *Subject:* Re: [Lsr] BFD aspects >>> >>> >>> >>> Hi Aijun, >>> >>> thank you for confirming that it is not the conclusion one can arrive >>> based on my discussion with Robert. Secondly, the problem you describe, I >>> wouldn't characterize as a scaling issue with using multi-hop BFD >>> monitoring path continuity in the underlay network. In my opinion, it is an >>> operational overhead that can be addressed by an intelligent management >>> plane or a few extensions in the control plane that is setting an overlay. >>> Since the management plane is usually a proprietary solution, I invite >>> anyone interested in working on BFD auto-configuration extensions in the >>> control plane. I much appreciate references to the use cases that can >>> benefit from such extensions. >>> >>> >>> >>> Regards, >>> >>> Greg >>> >>> >>> >>> On Mon, Nov 29, 2021 at 6:26 PM Aijun Wang <wangai...@tsinghua.org.cn> >>> wrote: >>> >>> Hi, Greg: >>> >>> >>> >>> Firstly, regardless of which methods to be used for the multihop BFD >>> approach, it is certainly the configuration overhead if you image there are >>> 10,000 PEs as Tony often raised as one example. >>> >>> Shouldn’t you configure each pair of them to detect the PE-PE >>> connection? >>> >>> It is obvious not scalable. >>> >>> >>> >>> >>> >>> Best Regards >>> >>> >>> >>> Aijun Wang >>> >>> China Telecom >>> >>> >>> >>> *From:* Greg Mirsky <gregimir...@gmail.com> >>> *Sent:* Tuesday, November 30, 2021 10:18 AM >>> *To:* Aijun Wang <wangai...@tsinghua.org.cn> >>> *Cc:* Gyan Mishra <hayabusa...@gmail.com>; Robert Raszuk < >>> rob...@raszuk.net>; lsr <lsr@ietf.org> >>> *Subject:* Re: [Lsr] BFD aspects >>> >>> >>> >>> Hi Aijun, >>> >>> could you please elaborate on how you see that this discussion leads to >>> the "BFD based detection for the mentioned problem is not [...] >>> scalable(among PEs)" conclusion? I hope that there's nothing I've said or >>> suggested lead you to this conclusion. Personally, I believe that BFD-based >>> PE-PE is the best technical solution. I understand that an operator may be >>> dissatisfied with the additional configuration of the BFD session. As >>> noted, I believe that can be addressed in the management plane or minor >>> extensions in the control plane (BGP or not). If a particular >>> implementation (or a combination of the implementation and HW) has a >>> scaling challenge with multi-hop BFD, then that could be not enough >>> sufficient technical justification for a somewhat controversial proposal. >>> >>> >>> >>> Regards, >>> >>> Greg >>> >>> >>> >>> On Mon, Nov 29, 2021 at 5:17 PM Aijun Wang <wangai...@tsinghua.org.cn> >>> wrote: >>> >>> From the discussion, I think we can get the conclusion that BFD based >>> detection for the mentioned problem is not reliable (between PE/RR) and >>> scalable(among PEs). >>> >>> Then also the BGP based solution. >>> >>> >>> >>> So let’s focus how to implement it within the IGP? Thanks Greg’s >>> analysis. >>> >>> And one supplement for Robert’s comments: RR is always not located >>> within the same area as PEs, then can’t know the down of PE nodes >>> immediately when the summary is configured between areas. >>> >>> >>> >>> Best Regards >>> >>> >>> >>> Aijun Wang >>> >>> China Telecom >>> >>> >>> >>> *From:* lsr-boun...@ietf.org <lsr-boun...@ietf.org> *On Behalf Of *Gyan >>> Mishra >>> *Sent:* Tuesday, November 30, 2021 8:44 AM >>> *To:* Robert Raszuk <rob...@raszuk.net> >>> *Cc:* Greg Mirsky <gregimir...@gmail.com>; lsr <lsr@ietf.org> >>> *Subject:* Re: [Lsr] BFD aspects >>> >>> >>> >>> >>> >>> Robert >>> >>> >>> >>> On Mon, Nov 29, 2021 at 7:35 PM Robert Raszuk <rob...@raszuk.net> wrote: >>> >>> Hi Greg, >>> >>> >>> >>> If BFD would have autodiscovery built in, that would indeed be the >>> ultimate solution. Of course folks will worry about scaling and number of >>> BFD sessions to be run PE-PE. >>> >>> GIM>> I sense that it is not "BFD autodiscovery" but an advertisement of >>> BFD multi-hop system readiness to the particular PE. That, as I think of >>> it, can be done in a control or management plane. >>> >>> >>> >>> Agreed. >>> >>> >>> >>> But if BFD between all PEs would be an option why RR to PE in the local >>> area would not be a viable solution ? >>> >>> >>> >>> GIM>>Because, in the case of PE-PE, BFD control packets will be >>> fate-sharing with data packets. But the path between RR and PE might not be >>> used for carrying data packets at all. >>> >>> >>> >>> 100%. But that was accounted for. Reason being that you have at least >>> two RRs in an area. The point of BFD was to use detect that PE went down. >>> >>> >>> >>> Gyan> What Greg is alluding is a very good point to consider is that the >>> RR in many cases in operator networks sit in the “control plane” path >>> which is separate from the data plane path. So the E2E forwarding plane >>> path between the PEs, the RR has no knowledge as is it sits outside the >>> forwarding plane path. That being said the PE to RR path is disjoint from >>> the PE-PE path so from the PE-RR RR POV may think the PE is up or down >>> thus the false positive or negative. That would be the case regardless of >>> how many RRs are deployed. >>> >>> >>> >>> You are absolutely right that it may report RR disconnect from the >>> network while PE is up and data plane from remote PEs can reach it. That is >>> why we have more than one RR. >>> >>> >>> >>> As far as fate sharing PE-PE BFD with real user data - I think it is not >>> always the case. But this is completely separate discussion :) >>> >>> >>> >>> Also please keep in mind that PE going down can be learned by RRs by >>> listening to the IGP. No BFD needed. >>> >>> >>> >>> Both would be multihop, both would be subject to all transit failures >>> etc ... >>> >>> GIM>> I think that there's a difference between the impact a path >>> failure has on the data traffic. In the case of monitoring PE-PE path in >>> the underlay and using the same encapsulation as data traffic is >>> representative of the data experience. A failure of the PE-RR path, in my >>> understanding, may be not representative at all. BFD session between RR and >>> PE may fail while PE is absolutely functional from the service PoV. >>> >>> >>> >>> Please keep in mind that this entire discussion is not about data plane >>> failure end to end :) Yes, it's pretty sad. This entire debate is to >>> indicate domain wide that the IGP component on a PE went down. >>> >>> >>> >>> No one considers data plane liveness and even as you observed data plane >>> encapsulation congruence. Clearly this is not a true OAM discussion. >>> >>> >>> >>> On the other hand, PE might be disconnected from the service while the >>> BFD session to RR is in the Up state. >>> >>> >>> >>> Not likely if you keep in mind that to trigger any remote action such >>> failure would have to happen to all RRs. >>> >>> >>> >>> Thx a lot, >>> R. >>> >>> >>> >>> _______________________________________________ >>> Lsr mailing list >>> Lsr@ietf.org >>> https://www.ietf.org/mailman/listinfo/lsr >>> >>> -- >>> >>> <http://www.verizon.com/> >>> >>> *Gyan Mishra* >>> >>> *Network Solutions Architect * >>> >>> *Email gyan.s.mis...@verizon.com <gyan.s.mis...@verizon.com>* >>> >>> *M 301 502-1347* >>> >>> >>> >>> _______________________________________________ >>> Lsr mailing list >>> Lsr@ietf.org >>> https://www.ietf.org/mailman/listinfo/lsr >>> >>>
_______________________________________________ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr