Re: [Lsr] BGP vs PUA/PULSE

Aijun Wang Mon, 24 Jan 2022 01:18:43 -0800

Hi, Christian:

OK, let's try to converge into one standard. But it should be IGP based, or 
else should be discussed by other WG.



Best Regards

Aijun Wang
China Telecom

-----Original Message-----
From: Christian Hopps <[email protected]> 
Sent: Monday, January 24, 2022 11:48 AM
To: Aijun Wang <[email protected]>
Cc: 'Christian Hopps' <[email protected]>; [email protected]; 'John E Drake' 
<[email protected]>; 'Robert Raszuk' <[email protected]>; 'Les Ginsberg 
(ginsberg)' <[email protected]>; 'Shraddha Hegde' <[email protected]>; 
'Tony Li' <[email protected]>; 'Hannes Gredler' <[email protected]>; 'lsr' 
<[email protected]>; 'Peter Psenak (ppsenak)' <[email protected]>
Subject: Re: [Lsr] BGP vs PUA/PULSE


[This is with my chair hat on; however, I am not making any actually calls on 
rough consensus as that would come from both Acee and I after we have discussed 
things and come to an agreement]

First: Just because people are disagreeing doesn't mean that we accept failure 
in this WG and start "standardizing" multiple solutions to everything. As a 
group we will discuss things and then the WG chairs will make a call on whether 
we have rough consensus or *not*.

The recent acceptance of multiple experimental track drafts was borderline a 
(big) mistake, and one huge negative against doing it was that someone was 
inevitably going suggest we do it the next time they wanted to get their draft 
published when there was any sort of contention over solutions.

Our job as a WG is not to publish RFCs simply because people write drafts. It's 
to produce standards for solutions (*) to problems (*) that need fixing (*).

As a WG we have to decide if (*) {is good enough, that actually exist, are 
worth trying to fix}. The chairs have to decide if the WG has reached at least 
a rough consensus on all of the (*)s, and if not, we simply do not move forward 
with that work.

Not getting agreement from a bunch of experts in the WG basically means the 
work is not going to move forward.

Thanks,
Chris.
[As WG chair]


"Aijun Wang" <[email protected]> writes:

> Hi, Christian, Acee:
>
> It seems that experts within LSR can't converged to one standard 
> solutions, then how about we forward them also into experimental 
> track, and let the implementation and market to select/determine?
> With the experimental deployment experiences, we can get the most suitable 
> solutions.
>
> Best Regards
>
> Aijun Wang
> China Telecom
>
> -----Original Message-----
> From: [email protected] <[email protected]> On Behalf Of 
> Christian Hopps
> Sent: Saturday, January 15, 2022 10:07 AM
> To: Aijun Wang <[email protected]>
> Cc: John E Drake <[email protected]>; Robert Raszuk 
> <[email protected]>; Les Ginsberg (ginsberg) <[email protected]>; 
> Christian Hopps <[email protected]>; Shraddha Hegde 
> <[email protected]>; Tony Li <[email protected]>; Hannes Gredler 
> <[email protected]>; lsr <[email protected]>; Peter Psenak (ppsenak) 
> <[email protected]>
> Subject: Re: [Lsr] BGP vs PUA/PULSE
>
> Yes, having worked intimately with these IGPs for > 20 years now, I 
> understand the use and the implications of using summary routes. :)
>
> My opinion remains unchanged.
>
> Thanks,
> Chris.
> [as wg member]
>
>> On Jan 14, 2022, at 8:50 PM, Aijun Wang <[email protected]> wrote:
>>
>> Hi, Christian:
>>
>> We should consider the balance and efficiency for the summary or not summary.
>> If you don’t summary, then all the areas will be filled with the 
>> specified detail routes(all PE’s loopback, may also include all P’s 
>> loopback). This can certainly increase the burden of the routers.
>>
>> But with summary, all these specific routes need not exist in the 
>> routing table. The nodes within the IGP need only be notified when 
>> one node is failure to accelerate the switchover of the overlay service.
>> And, you can also select to not using such mechanism, then the service will 
>> be backhole for some time until the service/application find this abnormal 
>> phenomenon.
>> PUA/PULSE are just the mechanism to reduce the abnormal durations, it is one 
>> kind of FRR technique.
>>
>> Aijun Wang
>> China Telecom
>>
>>> On Jan 15, 2022, at 09:26, Christian Hopps <[email protected]> wrote:
>>>
>>> 
>>>
>>>> On Jan 14, 2022, at 8:25 PM, Christian Hopps <[email protected]> wrote:
>>>>
>>>> I understand the proposal. As I've stated elsewhere, I do not 
>>>> believe there is a problem here that needs solving. The "problem" 
>>>> was created by the user by summarizing prefixes that should not 
>>>> have been summarized -- they mis-configured their network. The 
>>>> routing protocols works just fine (act very quickly) if you don't 
>>>> incorrectly summarize "really important prefixes".
>>>>
>>>> I was simply pointing out that IGPs also don't deal in liveness since that 
>>>> keeps coming up.
>>>
>>> Sorry that was "as wg member".
>>>
>>>>
>>>> Thanks,
>>>> Chris.
>>>>
>>>>>> On Jan 14, 2022, at 8:06 PM, Aijun Wang <[email protected]> 
>>>>>> wrote:
>>>>>
>>>>> Hi, Christian and John:
>>>>>
>>>>> No. I think you all may misunderstand the proposal. What we are detecting 
>>>>> is actually the reachability/liveness of node that connected to the 
>>>>> application, not the application itself.
>>>>> And, I think the node liveness is same as the node reachability. They 
>>>>> will all influence or break the path to their connected service if their 
>>>>> forwarding function is failed.
>>>>>
>>>>> Aijun Wang
>>>>> China Telecom
>>>>>
>>>>>> On Jan 15, 2022, at 08:56, Christian Hopps <[email protected]> wrote:
>>>>>>
>>>>>> Indeed, and in fact the IGP should only be dealing with the 
>>>>>> reachability to the node, not with the node or applications liveness.
>>>>>>
>>>>>> Thanks,
>>>>>> Chris.
>>>>>> [as wg member]
>>>>>>
>>>>>>> On Jan 14, 2022, at 7:47 PM, John E Drake <[email protected]> wrote:
>>>>>>>
>>>>>>> I don’t think so. Today things just work, at a given time scale. 
>>>>>>> What you said you are trying to do is reduce the time scale for 
>>>>>>> detecting that an application on a node has failed. However, 
>>>>>>> conflating the health of a node with the health of an 
>>>>>>> application on that node seems to be inherently flawed.
>>>>>>>
>>>>>>> Yours Irrespectively,
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>> Juniper Business Use Only
>>>>>>> From: Aijun Wang <[email protected]>
>>>>>>> Sent: Friday, January 14, 2022 7:40 PM
>>>>>>> To: John E Drake <[email protected]>
>>>>>>> Cc: Les Ginsberg (ginsberg) <[email protected]>; Robert Raszuk 
>>>>>>> <[email protected]>; Christian Hopps <[email protected]>; 
>>>>>>> Shraddha Hegde <[email protected]>; Tony Li 
>>>>>>> <[email protected]>; Hannes Gredler <[email protected]>; lsr 
>>>>>>> <[email protected]>; Peter Psenak (ppsenak) <[email protected]>
>>>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE
>>>>>>>
>>>>>>> [External Email. Be cautious of content]
>>>>>>>
>>>>>>> When the node is up, all the following process are passed to the 
>>>>>>> application layer. This is the normal procedures of the IGP should do.
>>>>>>> According to your logic, IGP are solving the wrong problem now?
>>>>>>>
>>>>>>> Aijun Wang
>>>>>>> China Telecom
>>>>>>>
>>>>>>>
>>>>>>> On Jan 15, 2022, at 08:30, John E Drake 
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> 
>>>>>>> Correct, but as Tony, Robert and I have noted, a node being up 
>>>>>>> does not mean that an application on that node is up, which 
>>>>>>> means that your proposed solution is probably a solution to the 
>>>>>>> wrong problem. Further, Robert’s solution is probably a solution to the 
>>>>>>> right problem.
>>>>>>>
>>>>>>> Yours Irrespectively,
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>> Juniper Business Use Only
>>>>>>> From: Aijun Wang <[email protected]>
>>>>>>> Sent: Friday, January 14, 2022 5:53 PM
>>>>>>> To: John E Drake <[email protected]>
>>>>>>> Cc: Robert Raszuk <[email protected]>; Les Ginsberg (ginsberg) 
>>>>>>> <[email protected]>; Christian Hopps <[email protected]>; 
>>>>>>> Shraddha Hegde <[email protected]>; Tony Li 
>>>>>>> <[email protected]>; Hannes Gredler <[email protected]>; lsr 
>>>>>>> <[email protected]>; Peter Psenak (ppsenak) <[email protected]>
>>>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE
>>>>>>>
>>>>>>> [External Email. Be cautious of content]
>>>>>>>
>>>>>>> Hi, John:
>>>>>>> Please note if the node is down, the service will not be accessed.
>>>>>>> We are discussing the “DOWN” notification, not the “UP” notification.
>>>>>>>
>>>>>>> Aijun Wang
>>>>>>> China Telecom
>>>>>>>
>>>>>>>
>>>>>>> On Jan 15, 2022, at 00:25, John E Drake 
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> 
>>>>>>> Hi,
>>>>>>>
>>>>>>> Comment inline below.
>>>>>>>
>>>>>>> Yours Irrespectively,
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>> Juniper Business Use Only
>>>>>>> From: Lsr <[email protected]> On Behalf Of Robert Raszuk
>>>>>>> Sent: Monday, January 10, 2022 7:15 PM
>>>>>>> To: Les Ginsberg (ginsberg) <[email protected]>
>>>>>>> Cc: Christian Hopps <[email protected]>; Aijun Wang 
>>>>>>> <[email protected]>; Shraddha Hegde 
>>>>>>> <[email protected]>; Tony Li <[email protected]>; Hannes 
>>>>>>> Gredler <[email protected]>; lsr <[email protected]>; Peter Psenak 
>>>>>>> (ppsenak) <[email protected]>
>>>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE
>>>>>>>
>>>>>>> [External Email. Be cautious of content]
>>>>>>>
>>>>>>> Hi Les,
>>>>>>>
>>>>>>>> You seem focused on the notification delivery mechanism only.
>>>>>>>
>>>>>>> Not really. For me, an advertised summary is like a prefix when you are 
>>>>>>> dialing a country code. Call signaling knows to go north if you are 
>>>>>>> calling a crab shop in Alaska.
>>>>>>>
>>>>>>> Now such direction does not indicate if the shop is open or has crabs.
>>>>>>>
>>>>>>> That info you need to get over the top as a service. So I am much more 
>>>>>>> in favor to make the service to tell you directly or indirectly that it 
>>>>>>> is available.
>>>>>>>
>>>>>>> [JD]  Right.  Just because a node is up and connected to the network 
>>>>>>> does not imply that a given application is active on it.
>>>>>>>
>>>>>>> Best,
>>>>>>> R.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 11, 2022 at 1:07 AM Les Ginsberg (ginsberg) 
>>>>>>> <[email protected]> wrote:
>>>>>>> Robert -
>>>>>>>
>>>>>>> From: Robert Raszuk <[email protected]>
>>>>>>> Sent: Monday, January 10, 2022 2:56 PM
>>>>>>> To: Les Ginsberg (ginsberg) <[email protected]>
>>>>>>> Cc: Tony Li <[email protected]>; Christian Hopps 
>>>>>>> <[email protected]>; Peter Psenak (ppsenak) <[email protected]>; 
>>>>>>> Shraddha Hegde <[email protected]>; Aijun Wang 
>>>>>>> <[email protected]>; Hannes Gredler <[email protected]>; 
>>>>>>> lsr <[email protected]>
>>>>>>> Subject: Re: [Lsr] BGP vs PUA/PULSE
>>>>>>>
>>>>>>> Les,
>>>>>>>
>>>>>>> We have received requests from real customers who both need to 
>>>>>>> summarize AND would like better response time to loss of reachability 
>>>>>>> to individual nodes.
>>>>>>>
>>>>>>> We all agree the request is legitimate.
>>>>>>>
>>>>>>> [LES:] It does not seem to me that everyone does agree on that – but I 
>>>>>>> appreciate that you agree.
>>>>>>>
>>>>>>> But do they realize that to practically employ what you are 
>>>>>>> proposing (new PDU flooding) requires 100% software upgrade to 
>>>>>>> all IGP nodes in the entire network ? Do they also realize that 
>>>>>>> to effectively use it requires data plane change (sure software 
>>>>>>> but data plane code is not as simple as
>>>>>>> PI) on all ingress PEs ?
>>>>>>>
>>>>>>> [LES:] As far as forwarding, as Peter has indicated, we have a POC and 
>>>>>>> it works fine. And there are many possible ways for implementations to 
>>>>>>> go.
>>>>>>> It may or may not require 100% software upgrade – but I agree a 
>>>>>>> significant number of nodes have to be upgraded to at least support 
>>>>>>> pulse flooding.
>>>>>>>
>>>>>>>
>>>>>>> And with scale requirements you are describing it seems this 
>>>>>>> would be 1000s of nodes (if not more). That's massive if 
>>>>>>> compared to alternative approaches to achieve the same or even better 
>>>>>>> results.
>>>>>>>
>>>>>>> [LES:] Be happy to review other solutions if/when someone writes them 
>>>>>>> up.
>>>>>>> I think what is overlooked in the discussion of other solutions 
>>>>>>> is that reachability info is provided by the IGP. If all the IGP 
>>>>>>> advertises is a summary then how would individual loss of 
>>>>>>> reachability become known at scale?
>>>>>>> You seem focused on the notification delivery mechanism only.
>>>>>>>
>>>>>>> Les
>>>>>>>
>>>>>>> Many thx,
>>>>>>> Robert
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Lsr mailing list
>>>>>>> [email protected]
>>>>>>> https://www.ietf.org/mailman/listinfo/lsr
>>>>>>
>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> Lsr mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/lsr

_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] BGP vs PUA/PULSE

Reply via email to