Re: [Lsr] BGP vs PUA/PULSE

Aijun Wang Thu, 02 Dec 2021 04:05:24 -0800

Hi, Tony:

Aijun Wang
China Telecom


> On Dec 2, 2021, at 18:51, Tony Przygienda <[email protected]> wrote:
> 
> 
> Idly thinking about the stuff more and more issues pop up that confirm my 
> initial gut feeling that the pulse stuff is simply not what IGP can do 
> reasonably (i.e. liveliness). negative as liveliness indication is arguably 
> even worse ;-) but I think most of us agreed on that across those hundreds of 
> emails by now. 
> 
> So, to expound a bit. IGP reachability which IGP does normally is _very_ 
> different from liveliness and here's another example (I describe it in 
> principle but people who deployed stuff will know what scenarios I'm talking 
> about) 
> 
> So, in short, the fact that an IGP, let's say ABR, advertises a summary has 
> _nothing_ to do much with liveliness of what it summarizes in system wide 
> sense. In more specifics, even when this aggregate goes away or IGP cannot 
> compute _reachability_ to a specific address/node does NOT mean that the 
> prefix advertised by such node is not _alive_. 

[WAJ] The “DOWN” state just indicate the mentioned prefixes is unreachable, not 
not_alive_.

> 
> Imagine (often done in fact in deployments I dealt with) that the prefix 
> advertised by a node into IGP is not _reachable_ by IGP all of a sudden, 
> simplest case being a link loss of course. However, it is in the system still 
> reachable by means e.g. of a default route from another protocol or a 
> specific route (static?) over a link IGP is not running on. Now, if IGP 
> starts to pulse it will defeat the very purpose of such backup. 

[WAJ] Reachable or unreachable is based on the node’s route table, not the IGP 
itself.

> 
> And no, you cannot "know" whether backup is here, there are even funky cases 
> where a policy only installs a backup route if the primary went away which 
> may be fast enough to keep e.g. TCP up (whether it's the best possible 
> architecture is disputable but it's a fact of live that such stuff exists). 

[WAJ] Consume of such information is depended on the receiving node itself. The 
“DOWN” information just give a indication  for the overlay service on top of 
the interested prefixes. 

> 
> So, basically we try to invent "liveliness indication" in IGP whereas IGP 
> cannot be aware whether the prefix is reachable system-wide through it even 
> when IGP lost _reachability_. 
[WAJ] The ABR have knowledge to know this.
> 
> And yes, before we go there, I know that with enough "limited domain" and 
> "limited scale" and "limited use case" arguments anything one can imagine 
> "works" ... 

[WAJ] IGP are all deployed in limited domain.

> 
> --- tony 
> 
>> On Wed, Dec 1, 2021 at 8:13 PM Les Ginsberg (ginsberg) <[email protected]> 
>> wrote:
>> Tony –
>> 
>>  
>> 
>> Inline.
>> 
>>  
>> 
>> From: Tony Przygienda <[email protected]> 
>> Sent: Wednesday, December 1, 2021 9:33 AM
>> To: Les Ginsberg (ginsberg) <[email protected]>
>> Cc: Peter Psenak (ppsenak) <[email protected]>; Hannes Gredler 
>> <[email protected]>; lsr <[email protected]>; Tony Li <[email protected]>; Aijun 
>> Wang <[email protected]>; Robert Raszuk <[email protected]>; 
>> Shraddha Hegde <[email protected]>
>> Subject: Re: [Lsr] BGP vs PUA/PULSE
>> 
>>  
>> 
>> "
>> 
>> Nodes which originate FSP-LSPs MUST
>>    remember the last sequence number used for a given FSP-LSP and
>>    increment the sequence number when generating a new version.
>>  
>>    FSP-LSP generation SHOULD utilize the "next" FSP-LSP ID each time new
>>    pulse information needs to be advertised i.e., if the most recent
>>    FSP-LSP ID used was A-00.n, the next set of pulse information SHOULD
>>    be advertised using FSP-LSP.ID A-00.n+1.  This minimizes the
>>    possibility of confusion if other routers in the network have not yet
>>    removed A-00.n from their LSPDB.
>> "
>> So you tell me I onver-interpreted as "between restarts" ;-) OK, fine. Fair 
>> 'nuff. Maybe add one sentence clarification. 
>> [LES:] Sure.
>> Otherwise yeah, I'd like the draft to add the "in case of partition things 
>> may break but it's not much worse than before" ;-) and "assumption is that 
>> the overlay will retry after dropping session on negative so no positives 
>> are needed" and I'm ok with this thread. 
>> [LES:] I think significantly more needs to be said about the current use 
>> case for event notification – and this point can be part of that. Look for 
>> that in the next revision of the draft.
>> my big gripe about "don't do it in main ISIS, take service instance" remains 
>> though due to scalability concerns that bunch of senior folks here raised 
>> already 
>> [LES:] I am not in favor of a separate instance in this case. Reason being 
>> all of the information required to determine when to send pulses is already 
>> known by the main instance. Moving the pulse advertisements themselves to a 
>> separate instance would likely be more costly in resources on the routers 
>> themselves than advertising them in the main instance. Scale considerations 
>> need to be addressed – as has been stated in this and earlier threads many 
>> times – and that would be true regardless of whether we used the main 
>> instance or a separate instance. 
>> There is also the point made by Greg Mirsky early on in this discussion – 
>> that the use of event-notification needs to be carefully limited to cases 
>> that make sense for the main routing instance. The next revision of the 
>> draft will also address this point.
>>     Les
>> -- tony 
>>  
>> 
>> On Wed, Dec 1, 2021 at 5:52 PM Les Ginsberg (ginsberg) <[email protected]> 
>> wrote:
>> 
>> Tony –
>> 
>>  
>> 
>>  
>> 
>> From: Tony Przygienda <[email protected]> 
>> Sent: Wednesday, December 1, 2021 7:58 AM
>> To: Peter Psenak (ppsenak) <[email protected]>
>> Cc: Les Ginsberg (ginsberg) <[email protected]>; Hannes Gredler 
>> <[email protected]>; lsr <[email protected]>; Tony Li <[email protected]>; Aijun 
>> Wang <[email protected]>; Robert Raszuk <[email protected]>; 
>> Shraddha Hegde <[email protected]>
>> Subject: Re: [Lsr] BGP vs PUA/PULSE
>> 
>>  
>> 
>> 1. my question is different. why does the draft say that seqnr# & IDs have 
>> to be preserved between restarts
>> 
>>  
>> 
>>  
>> 
>> [LES:] Section 4.3.1 of the draft tries to answer your question – but there 
>> is no mention of “restart” there.
>> 
>> There is in fact no mention of restart anywhere in the draft other than to 
>> say pulses are not preserved across restarts.
>> 
>>  
>> 
>> WE only retain the sequence #’s to make it easier to identify a new Pulse 
>> LSP from a retransmission.
>> 
>>  
>> 
>>  
>> 
>> 2. I'm still concerned about L1/L2 hierarchy. If an L2 border sees same 
>> prefix negative pulses from two different L1/L2s  it still has to keep state 
>> to only pulse into L1 after _all_ the guys pulsed negative (which is 
>> basically impossible since the _negative_ cannot persist it seems). Now how 
>> will it even know that? it has to keep track who advertised the same summary 
>> & who pulsed or otherwise it will pulse on anyone with a summary giving a 
>> pulse and with that anycast won't work AFAIS and worse you get into weird 
>> situations where you have 2 L1/L2 into same L1 area, one lost link to reach 
>> the PE (arguably L1 got partitioned) and pulses & then the L1/L2 on the 
>> border of the down L1 pulses and tears the session down albeit the prefix is 
>> perfectly reachable through the other L1/L2. I assume that parses for the 
>> connoscenti ...
>> 
>>  
>> 
>> [LES:] We are not trying to handle the area partition case.
>> 
>> In such a case, even if nothing is done, traffic will flow via both ABRs and 
>> half of it will be dropped – so one could argue that switching BGP traffic 
>> to the backup path is still a good idea.
>> 
>>  
>> 
>>    Les
>> 
>>  
>> 
>> -=--- tony
>> 
>>  
>> 
>> On Wed, Dec 1, 2021 at 4:00 PM Peter Psenak <[email protected]> wrote:
>> 
>> Tony,
>> 
>> On 01/12/2021 15:31, Tony Przygienda wrote:
>> 
>> > 
>> > Or maybe I missed something in the draft or between the lines in the 
>> > whole thing ... Do we assume the negative just quickly tears down the 
>> > BGP session & then it loses any relevance and we rely on BGP to retry 
>> > after reset automatically or something? 
>> 
>> yes.
>> 
>> 
>> But then why do we even care about retaining the LSP IDs & SeqNr# would 
>> I ask?
>> 
>> it's used for the purpose of flooding, so that during the flooding you 
>> do not flood the same pulse LSP multiple times.
>> 
>> thanks,
>> Peter
>> 
>> 
>> > 
>> > -- tony
>> > 
>> > 
>> > 
>> > 
>> > 
>> > On Tue, Nov 30, 2021 at 11:19 PM Les Ginsberg (ginsberg) 
>> > <[email protected] 
>> > <mailto:[email protected]>> wrote:
>> > 
>> >     Hannes -
>> > 
>> >     Please see
>> >     
>> > https://datatracker.ietf.org/doc/html/draft-ppsenak-lsr-igp-event-notification-00#section-4.1
>> > 
>> >     The new Pulse LSPs don't have remaining lifetime - quite intentionally.
>> >     They are only retained long enough to support flooding.
>> > 
>> >     But, you remind me that we need to specify how the checksum is
>> >     calculated. Will do that in the next revision.
>> > 
>> >     Thanx.
>> > 
>> >          Les
>> > 
>> >      > -----Original Message-----
>> >      > From: Hannes Gredler <[email protected] <mailto:[email protected]>>
>> >      > Sent: Tuesday, November 30, 2021 11:22 AM
>> >      > To: Peter Psenak (ppsenak) <[email protected]
>> >     <mailto:[email protected]>>
>> >      > Cc: Robert Raszuk <[email protected] <mailto:[email protected]>>;
>> >     Les Ginsberg (ginsberg)
>> >      > <[email protected] <mailto:[email protected]>>; Aijun Wang
>> >     <[email protected] <mailto:[email protected]>>; lsr
>> >      > <[email protected] <mailto:[email protected]>>; Tony Li <[email protected]
>> >     <mailto:[email protected]>>; Shraddha Hegde
>> >      > <[email protected] <mailto:[email protected]>>
>> >      > Subject: Re: [Lsr] BGP vs PUA/PULSE
>> >      >
>> >      > hi peter,
>> >      >
>> >      > Just curious: Do you have an idea how to make short-lived LSPs
>> >     compatible
>> >      > with the problem stated in
>> >      > https://datatracker.ietf.org/doc/html/rfc7987
>> >      >
>> >      > Would like to hear your thoughts on that.
>> >      >
>> >      > thanks,
>> >      >
>> >      > /hannes
>> >      >
>> >      > On Tue, Nov 30, 2021 at 01:15:04PM +0100, Peter Psenak wrote:
>> >      > | Hi Robert,
>> >      > |
>> >      > | On 30/11/2021 12:40, Robert Raszuk wrote:
>> >      > | > Hey Peter,
>> >      > | >
>> >      > | >      > #1 - I am not ok with the ephemeral nature of the
>> >     advertisements. (I
>> >      > | >      > proposed an alternative).
>> >      > | >
>> >      > | >     LSPs have their age today. One can generate LSP with the
>> >     lifetime of 1
>> >      > | >     min. Protocol already allows that.
>> >      > | >
>> >      > | >
>> >      > | > That's a pretty clever comparison indeed. I had a feeling it
>> >     will come
>> >      > | > up here and here you go :)
>> >      > | >
>> >      > | > But I am afraid this is not comparing apple to apples.
>> >      > | >
>> >      > | > In LSPs or LSA flooding you have a bunch of mechanisms to
>> >     make sure the
>> >      > | > information stays fresh
>> >      > | > and does not time out. And the default refresh in ISIS if I
>> >     recall was
>> >      > | > something like 15 minutes ?
>> >      > |
>> >      > | yes, default refresh is 900 for the default lifetime of 1200
>> >     sec. Most
>> >      > | people change both to much larger values.
>> >      > |
>> >      > | If I send the LSP with the lifetime of 1 min, there will never
>> >     be any
>> >      > | refresh of it. It will last 1 min and then will be purged and
>> >     removed from
>> >      > | the database. The only difference with the Pulse LSP is that it
>> >     is not
>> >      > | purged to avoid additional flooding.
>> >      > |
>> >      > |
>> >      > | >
>> >      > | >     Today in all MPLS networks host routes from all areas are
>> >     "spread"
>> >      > | >     everywhere including all P and PE routers, that's how LS
>> >     protocols
>> >      > | >     distribute data, we have no other way to do that in LS IGPs.
>> >      > | >
>> >      > | >
>> >      > | > Can't you run OSPF over GRE ? For ISIS Henk had proposal not
>> >     so long ago
>> >      > | > to run it over TCP too.
>> >      > | >
>> >     
>> > https://datatracker.ietf.org/doc/html/draft-hsmit-lsr-isis-flooding-over-
>> >      > tcp-00
>> >      > |
>> >      > | you can run anything over GRE, including IGPs, and you don't
>> >     need TCP
>> >      > | transport for that. I don't see the relevance here. Are you
>> >     suggesting to
>> >      > | create GRE tunnels to all PEs that need the pulses? Nah, that
>> >     would be an
>> >      > | ugly requirement.
>> >      > |
>> >      > | thanks,
>> >      > | Peter
>> >      > |
>> >      > |
>> >      > | >
>> >      > | > Seems like a perfect fit !
>> >      > | >
>> >      > | > Thx,
>> >      > | > R.
>> >      > |
>> > 
>> >     _______________________________________________
>> >     Lsr mailing list
>> >     [email protected] <mailto:[email protected]>
>> >     https://www.ietf.org/mailman/listinfo/lsr
>> >

_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] BGP vs PUA/PULSE

Reply via email to