Hi, Robert: Aijun Wang China Telecom
> On Dec 2, 2021, at 04:42, Robert Raszuk <[email protected]> wrote: > > > Apologies 2 corrections: > > 1) s/to their inter-as/ to their inter-area/ > > 2) "service stops for configured PULSE timeout (as discussed 200 sec)." > Actually in the described case it is much worse ... Service stops forever to > such area as service layer may not be at all aware about this kind of false > positive ! > > Btw this is also not an implementation detail as all multi vendor ABRs better > work in the same manner. > > And the robust solution to this case seems to be along the lines of the logic > you have described. PULSES must be acted on by L2 ABRs or by remote PEs > *only* when all sources of the summaries inject identical PULSE. [WAJ] https://datatracker.ietf.org/doc/html/draft-wang-lsr-prefix-unreachable-annoucement-08#section-4 has described such situations. I have also introduced it in the IETF 112 meeting. Please see the last paragraph of this section. > > That makes the feature a bit more complex .... > > Thx, > R. > > > > > > > >> On Wed, Dec 1, 2021 at 9:25 PM Robert Raszuk <[email protected]> wrote: >> Hi Tony, >> >> I have been thinking about your email a bit more. Actually the destructive >> issue you have described can happen not only in the case of partitioned L1 >> areas. >> >> Deployment scenario: >> >> It is quite often the case that ABRs connectivity intra-area are very >> different to their inter-as connections. That usually means that different >> line cards are used to connect to other routers in the local area then those >> in the core area. >> >> So when anything happens to the line card which connects L1 (for example it >> goes down, there is massive congestion, protocol queue is full etc ...) when >> previously received LSPs expire such ABR may trigger PULSE of all PE routers >> domain wide. And all the fuses discussed to prevent massive flooding will >> not kick in as there may be just say 10 PEs in the area - all working just >> fine. >> >> The other ABRs will happily continue to inject summaries but service stops >> for configured PULSE timeout (as discussed 200 sec). Note that it is full >> service stop not switching to a backup path as all PEs in the area PULSED >> domain wide. Not good. >> >> I have not seen any discussion about such a failure case so far. And only >> your mail triggered it ! >> >> Many thx, >> R. >> >> >> >>> On Wed, Dec 1, 2021 at 5:04 PM Robert Raszuk <[email protected]> wrote: >>> Hi Tony, >>> >>> On #2 I you are right in the case of src L1 getting partitioned. Yes it >>> will kill anycast design. If this is showstopper ... not sure. AFAIK only >>> sourcing ABRs need to keep track about all links to PE to be down. That >>> requirement does not propagate any further upstream. >>> >>> Thx >>> >>> On Wed, Dec 1, 2021 at 4:58 PM Tony Przygienda <[email protected]> wrote: >>>> 1. my question is different. why does the draft say that seqnr# & IDs have >>>> to be preserved between restarts >>>> >>>> 2. I'm still concerned about L1/L2 hierarchy. If an L2 border sees same >>>> prefix negative pulses from two different L1/L2s it still has to keep >>>> state to only pulse into L1 after _all_ the guys pulsed negative (which is >>>> basically impossible since the _negative_ cannot persist it seems). Now >>>> how will it even know that? it has to keep track who advertised the same >>>> summary & who pulsed or otherwise it will pulse on anyone with a summary >>>> giving a pulse and with that anycast won't work AFAIS and worse you get >>>> into weird situations where you have 2 L1/L2 into same L1 area, one lost >>>> link to reach the PE (arguably L1 got partitioned) and pulses & then the >>>> L1/L2 on the border of the down L1 pulses and tears the session down >>>> albeit the prefix is perfectly reachable through the other L1/L2. I assume >>>> that parses for the connoscenti ... >>>> >>>> -=--- tony >>>> >>>>> On Wed, Dec 1, 2021 at 4:00 PM Peter Psenak <[email protected]> wrote: >>>>> Tony, >>>>> >>>>> On 01/12/2021 15:31, Tony Przygienda wrote: >>>>> >>>>> > >>>>> > Or maybe I missed something in the draft or between the lines in the >>>>> > whole thing ... Do we assume the negative just quickly tears down the >>>>> > BGP session & then it loses any relevance and we rely on BGP to retry >>>>> > after reset automatically or something? >>>>> >>>>> yes. >>>>> >>>>> >>>>> But then why do we even care about retaining the LSP IDs & SeqNr# would >>>>> I ask? >>>>> >>>>> it's used for the purpose of flooding, so that during the flooding you >>>>> do not flood the same pulse LSP multiple times. >>>>> >>>>> thanks, >>>>> Peter >>>>> >>>>> >>>>> > >>>>> > -- tony >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > On Tue, Nov 30, 2021 at 11:19 PM Les Ginsberg (ginsberg) >>>>> > <[email protected] >>>>> > <mailto:[email protected]>> wrote: >>>>> > >>>>> > Hannes - >>>>> > >>>>> > Please see >>>>> > >>>>> > https://datatracker.ietf.org/doc/html/draft-ppsenak-lsr-igp-event-notification-00#section-4.1 >>>>> > >>>>> > The new Pulse LSPs don't have remaining lifetime - quite >>>>> > intentionally. >>>>> > They are only retained long enough to support flooding. >>>>> > >>>>> > But, you remind me that we need to specify how the checksum is >>>>> > calculated. Will do that in the next revision. >>>>> > >>>>> > Thanx. >>>>> > >>>>> > Les >>>>> > >>>>> > > -----Original Message----- >>>>> > > From: Hannes Gredler <[email protected] >>>>> > <mailto:[email protected]>> >>>>> > > Sent: Tuesday, November 30, 2021 11:22 AM >>>>> > > To: Peter Psenak (ppsenak) <[email protected] >>>>> > <mailto:[email protected]>> >>>>> > > Cc: Robert Raszuk <[email protected] <mailto:[email protected]>>; >>>>> > Les Ginsberg (ginsberg) >>>>> > > <[email protected] <mailto:[email protected]>>; Aijun Wang >>>>> > <[email protected] <mailto:[email protected]>>; lsr >>>>> > > <[email protected] <mailto:[email protected]>>; Tony Li <[email protected] >>>>> > <mailto:[email protected]>>; Shraddha Hegde >>>>> > > <[email protected] <mailto:[email protected]>> >>>>> > > Subject: Re: [Lsr] BGP vs PUA/PULSE >>>>> > > >>>>> > > hi peter, >>>>> > > >>>>> > > Just curious: Do you have an idea how to make short-lived LSPs >>>>> > compatible >>>>> > > with the problem stated in >>>>> > > https://datatracker.ietf.org/doc/html/rfc7987 >>>>> > > >>>>> > > Would like to hear your thoughts on that. >>>>> > > >>>>> > > thanks, >>>>> > > >>>>> > > /hannes >>>>> > > >>>>> > > On Tue, Nov 30, 2021 at 01:15:04PM +0100, Peter Psenak wrote: >>>>> > > | Hi Robert, >>>>> > > | >>>>> > > | On 30/11/2021 12:40, Robert Raszuk wrote: >>>>> > > | > Hey Peter, >>>>> > > | > >>>>> > > | > > #1 - I am not ok with the ephemeral nature of the >>>>> > advertisements. (I >>>>> > > | > > proposed an alternative). >>>>> > > | > >>>>> > > | > LSPs have their age today. One can generate LSP with the >>>>> > lifetime of 1 >>>>> > > | > min. Protocol already allows that. >>>>> > > | > >>>>> > > | > >>>>> > > | > That's a pretty clever comparison indeed. I had a feeling it >>>>> > will come >>>>> > > | > up here and here you go :) >>>>> > > | > >>>>> > > | > But I am afraid this is not comparing apple to apples. >>>>> > > | > >>>>> > > | > In LSPs or LSA flooding you have a bunch of mechanisms to >>>>> > make sure the >>>>> > > | > information stays fresh >>>>> > > | > and does not time out. And the default refresh in ISIS if I >>>>> > recall was >>>>> > > | > something like 15 minutes ? >>>>> > > | >>>>> > > | yes, default refresh is 900 for the default lifetime of 1200 >>>>> > sec. Most >>>>> > > | people change both to much larger values. >>>>> > > | >>>>> > > | If I send the LSP with the lifetime of 1 min, there will never >>>>> > be any >>>>> > > | refresh of it. It will last 1 min and then will be purged and >>>>> > removed from >>>>> > > | the database. The only difference with the Pulse LSP is that it >>>>> > is not >>>>> > > | purged to avoid additional flooding. >>>>> > > | >>>>> > > | >>>>> > > | > >>>>> > > | > Today in all MPLS networks host routes from all areas are >>>>> > "spread" >>>>> > > | > everywhere including all P and PE routers, that's how LS >>>>> > protocols >>>>> > > | > distribute data, we have no other way to do that in LS >>>>> > IGPs. >>>>> > > | > >>>>> > > | > >>>>> > > | > Can't you run OSPF over GRE ? For ISIS Henk had proposal not >>>>> > so long ago >>>>> > > | > to run it over TCP too. >>>>> > > | > >>>>> > >>>>> > https://datatracker.ietf.org/doc/html/draft-hsmit-lsr-isis-flooding-over- >>>>> > > tcp-00 >>>>> > > | >>>>> > > | you can run anything over GRE, including IGPs, and you don't >>>>> > need TCP >>>>> > > | transport for that. I don't see the relevance here. Are you >>>>> > suggesting to >>>>> > > | create GRE tunnels to all PEs that need the pulses? Nah, that >>>>> > would be an >>>>> > > | ugly requirement. >>>>> > > | >>>>> > > | thanks, >>>>> > > | Peter >>>>> > > | >>>>> > > | >>>>> > > | > >>>>> > > | > Seems like a perfect fit ! >>>>> > > | > >>>>> > > | > Thx, >>>>> > > | > R. >>>>> > > | >>>>> > >>>>> > _______________________________________________ >>>>> > Lsr mailing list >>>>> > [email protected] <mailto:[email protected]> >>>>> > https://www.ietf.org/mailman/listinfo/lsr >>>>> > >>>>> > _______________________________________________ > Lsr mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/lsr
_______________________________________________ Lsr mailing list [email protected] https://www.ietf.org/mailman/listinfo/lsr
