Re: [Lsr] BGP vs PUA/PULSE

Robert Raszuk Wed, 01 Dec 2021 16:13:00 -0800

Hi Aijun,

If you meant that paragraph:

   When only some of the ABRs can't reach the failure node/link, as that
   described in Section 3.2, the ABR that can reach the PUAM prefix
   should advertise one specific route to this PUAM prefix.  The
   internal routers within another area can then bypass the ABRs that
   can't reach the PUAM prefix, to reach the PUAM prefix.

If this is it then I think this is the worst possible idea. Moreover it
does not even work as PUA/PULSE have already propagated to remote PEs and
did "the damage". Remote PEs are sitting and waiting for the service layer
to reconverge.

You are now asking for self ABR to ABR state synchronization (even if by
only keepin an ear open to other ABR's PUAs) and based on that host route
injection and domain wide leaking of those artificial host routes by other
ABRs from a given area which not only does not help but will cause even
more churn domain wide.

I can think of a few much more elegant solutions, but I will let PULSE
authors come up with their own ideas :)

- -

Bottom line is that (putting aside all concerns already voiced by folks on
this list) both PULSE & PUA ideas perhaps can be made to work for the case
where PE really goes down.

However, to make it work in situations where ABRs just think PEs are down
but they are not, ie. by false positive DOWN events flooded domain wide -
the entire concept becomes much harder to handle in an elegant and scalable
way. And what I illustrated as the reasons for such scenarios is just tip
of the iceberg in pile of reasons why ABR may think PEs went down. There
are many more...

Kind regards,
Robert

On Thu, Dec 2, 2021 at 12:58 AM Aijun Wang <[email protected]>
wrote:

> Hi, Robert:
>
> Aijun Wang
> China Telecom
>
> On Dec 2, 2021, at 04:42, Robert Raszuk <[email protected]> wrote:
>
> 
> Apologies 2 corrections:
>
> 1)  s/to their inter-as/ to their inter-area/
>
> 2)  "service stops for configured PULSE timeout (as discussed 200 sec)."
> Actually in the described case it is much worse ... Service stops forever
> to such area as service layer may not be at all aware about this kind of
> false positive !
>
> Btw this is also not an implementation detail as all multi vendor ABRs
> better work in the same manner.
>
> And the robust solution to this case seems to be along the lines of the
> logic you have described. PULSES must be acted on by L2 ABRs or by remote
> PEs *only* when all sources of the summaries inject identical PULSE.
>
>
> [WAJ]
> https://datatracker.ietf.org/doc/html/draft-wang-lsr-prefix-unreachable-annoucement-08#section-4
>  has
> described such situations. I have also introduced it in the IETF 112
> meeting.
> Please see the last paragraph of this section.
>
>
> That makes the feature a bit more complex ....
>
> Thx,
> R.
>
> On Wed, Dec 1, 2021 at 9:25 PM Robert Raszuk <[email protected]> wrote:
>
>> Hi Tony,
>>
>> I have been thinking about your email a bit more. Actually the
>> destructive issue you have described can happen not only in the case of
>> partitioned L1 areas.
>>
>> *Deployment scenario: *
>>
>> It is quite often the case that ABRs connectivity intra-area are very
>> different to their inter-as connections. That usually means that different
>> line cards are used to connect to other routers in the local area then
>> those in the core area.
>>
>> So when anything happens to the line card which connects L1 (for example
>> it goes down, there is massive congestion, protocol queue is full etc ...)
>> when previously received LSPs expire such ABR may trigger PULSE of all PE
>> routers domain wide. And all the fuses discussed to prevent massive
>> flooding will not kick in as there may be just say 10 PEs in the area - all
>> working just fine.
>>
>> The other ABRs will happily continue to inject summaries but service
>> stops for configured PULSE timeout (as discussed 200 sec). Note that it is
>> full service stop not switching to a backup path as all PEs in the area
>> PULSED domain wide. Not good.
>>
>> I have not seen any discussion about such a failure case so far. And only
>> your mail triggered it !
>>
>> Many thx,
>> R.
>>
>

_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] BGP vs PUA/PULSE

Reply via email to