[Lsr] Re: A counter example

Tony Przygienda Thu, 21 Nov 2024 03:50:07 -0800

Robert, in case of using -prz- you don't have to go either way. you can
just roll over node by node and you are guaranteed that the whole graph
will be _sufficiently_ covered at all times and both components will do
their own "optimal" CDS but as TonyL points out there is -always- a chance
you will see much less than "optimal" flooding during such transition since
on the component edges you have full flooding (optimal meaning here really
"minimum number of links covering CDS" I guess). That is from OPEX
standpoint probably most appealing (though _most_ very large networks the
discussion is more about -double- coverage and avoiding articulations IME
than anything about "minimal number links CDS". Of course large IP fabric
and a large SP network are not the same thing and different "optimality"
may be desired ;-)


if you don't want to mix algorithms at _any_ time then A is fine. More OPEX
intensive of course and at certain point there is full flooding again
(which may melt the network if it relies on flooding throughput achievable
with reduction in place only). Leader solution (discarding all other
considerations) simplifies it since the rollover needs to touch every node
just once and then leader once and the network never "fully floods" though
blast radius on leader reconfig is full network (all nodes fall over at
same time). This is a consideration to customers as has been voiced due to
increasing amount of incidents in last years, part of it being scale, part
of it being automation but often only possible due to single node having a
network wide blast radius when being touched.

Scenario B is juggling chainsaws. Some people are very good at it, those
that are not are not at it for significant periods of time ;-)

my 2c

-- tony

Footnote: the -prz- thingy allows to  use the fact that a node _not_
advertising any supported algorithm is a zero prunner and hence that _may_
be leveraged by the algorithm on the other side but that's a different
kettle of fish discussion.

On Thu, Nov 21, 2024 at 10:47 AM Robert Raszuk <[email protected]> wrote:

> Hi Tony,
>
> > And it's perfectly fine AFAIS if the WG decides that working on
> "multiple
> > algorithm mix in the network" is not something to be pursuited,
>
> Scenario - A:
>
> Are you saying that when there is a need for migration from one algorithm
> to another the recommended prescription would be to first migrate to full
> flooding then enable the 2nd algorithm such that there is never the case
> when two algorithms co-exit ?
>
> Or
>
> Scenario - B:
>
> Are you saying that it would be fine for two algorithms to co-exit during
> such migration (limited time under close operator's control) in spite of
> chances of creating a single point of failure ? If the latter I presume a
> good implementation will allow one to manually force flooding on a link in
> spite of algorithm suppressing it during such special migration times.
>
> Personally I would be in favor of scenario B.
>
> Thx,
> Robert
>
>
> On Thu, Nov 21, 2024 at 10:16 AM Tony Przygienda <[email protected]>
> wrote:
>
>> Thanks Tony, good drill down. I see two points here:
>>
>> 1. the point I take here is that in the example resulting prunner
>> framework flooding covers the full graph, i.e. correctness as in
>> "sufficient flooding" is still assured.
>> 2. the solution may be _not_ optimal in terms of constructing a single
>> CDS, i.e. on the boundaries basically full flooding is mandated by the
>> prunner framework. Actually the most extreme case is where _every_ node in
>> the network runs a different algorithm and the prunner framework says
>> "well, flood on all links with different algorithm on the other side". Then
>> it all collapses into full flooding again.
>>
>> If that's my correct reading then please observe that the -prz- draft
>> does NOT state that in mixed algorithm scenarios _optimal_ flooding in any
>> sense is guaranteed (optimality here seems to mean "CDS with minimal number
>> of links"), it only says that prunner framework will guarantee "sufficient"
>> flooding to build an overall CDS, not less and not more. In fact that's the
>> paragraph that is possibly bits cryptical to most saying that you'd need a
>> "meta-prunner" algorithm for such stuff or synchronization of boundaries of
>> a component (funny enough, the considerations in such design start to be
>> closely related to arbitrary hierarchy principles ;-). There are other
>> considerations but they become even more arcane and AFAIS achieving an
>> "optimal" CDS when components with multiple algorithms are mixed is in
>> pragmatic terms not possible.
>>
>> So, if we agree that prunner framework (i.e. miximing multiple
>> algorithms) does guarantee "sufficient" flooding (i.e. full CDS) but does
>> NOT guarantee any "only necessary" flooding then we're in sync. And it's
>> perfectly fine AFAIS if the WG decides that working on "multiple algorithm
>> mix in the network" is not something to be pursuited, it will be sufficient
>> then to e.g. extend the 97xx to provide leader-based and leaderless
>> signalling as two options (just like there is centralized computed and
>> distributed already) and say that "mixing both modes or multiple algorithms
>> under leaderless is outside the scope of the document". Not every problem
>> under the sun needs to be solved by a WG and practical implications of such
>> scope limitations AFAIS are limited since in practical purposes mixing
>> limits blast radius on migrations and nothing else really AFAIU ;-)
>>
>> So I guess I wasn't specific enough when I said that I don't see a
>> counter example for -prz- framework not being correct. By correctness I
>> always meant "any mix of algorithms being prunners in a network will always
>> deliver _sufficient_ flooding" and not implied any kind of flooding
>> optimality.
>>
>> thanks
>>
>> --- tony
>>
>>
>> On Wed, Nov 20, 2024 at 10:57 PM Tony Li <[email protected]> wrote:
>>
>>>
>>> Hi all,
>>>
>>> Tony P. asked for a counter-example to why neighbor-only algorithm
>>> information is sufficient. This email attempts to articulate just such an
>>> example.
>>>
>>> Suppose that we have a bi-partite network, with two halves, A and B.
>>> Part A contains nodes A1, A2, A3, ….  Part B contains nodes B1, B2, B3, ….
>>>
>>> The two halves are connected by three links (A1, B1), (A2, B2), and (A3,
>>> B3).
>>>
>>> The correct flooding topology in this situation is to select exactly two
>>> of the three links. Selecting only one of the links would create a single
>>> point of failure. Selecting all three links leads to unacceptable and
>>> unnecessary flooding.
>>>
>>> Suppose that A1 and B1 are running algorithm X.  All other nodes are
>>> running algorithm Y.
>>>
>>> Suppose that under algorithm X, links 2 and 3 are selected.  Therefore,
>>> A1 and B1 choose to prune (A1, B1).  Further, suppose that under algorithm
>>> Y, links 1 and 2 are selected. Therefore, nodes A3 and B3 choose to prune
>>> (A3, B).  Now, only (A2, B2) is selected, creating a single point of
>>> failure.
>>>
>>> The key points here are simple:
>>>
>>> - An algorithm makes assumptions about how other nodes in the topology
>>> are going to behave. If multiple algorithms are in play, those assumptions
>>> may not hold.
>>>
>>> - Two concurrent algorithms, while each individually correct, can still
>>> produce a flawed flooding topology if they are asked to interoperate.
>>>
>>> - Full flooding at the boundary between the algorithms is not sufficient
>>> to correct the situation.
>>>
>>> Regards,
>>> Tony
>>>
>>> _______________________________________________
>>> Lsr mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
>>>
>> _______________________________________________
>> Lsr mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>>
>

_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Lsr] Re: A counter example

Reply via email to