Re: [Lsr] Open issues with Dynamic Flooding

Peter Psenak Tue, 05 Mar 2019 12:18:32 -0800

Robert,

On 05/03/2019 20:12 , Robert Raszuk wrote:

Slow convergence is obviously not a good thing


Could you please kindly elaborate why ?

With tons of ECMP in DCs or with number of mechanism for very fast data
plane repairs in WAN (well beyond FRR) IMHO any protocol *fast
convergence* is no longer a necessity. Yet many folks still talk about
it like the only possible rescue ...


we are talking about the control plane convergence, not data plane one.

If the flooding topology is subset of the real topology, then at theflooding level you don't have all the ECMPs available - you only havetwo paths to reach any node. In such case it is possible that theflooding topology gets partitioned and you want to get out of that statequickly, as you may get out of sync with the the reality and eventuallyloose all the data plane ECMPs as a consequence.


thanks,
Peter



On Tue, Mar 5, 2019 at 5:42 PM Tony Przygienda <[email protected]
<mailto:[email protected]>> wrote:

    in practical terms +1 to Peter's take here ... Unless we're talking
    tons of failures simultaneously (which AFAI talked to folks are not
    that common but can sometimes happen in DCs BTW due to weird things)
    smaller scale failures with few links would cause potentially
    diffused "chaining" of convergence behavior rather than IGP-style
    fast healing (and on top of that I didn't see a lot of interest in
    formalizing a rigorous distributed algorithm which IMO would be
    necessary to ensure ultimate convergence when only one/subset of
    links is used). Slow convergence is obviously not a good thing
    unless we assume people will run FRR with its complexity in DC
    and/or no more than one link every fails which seems to me bending
    assumptions to whatever solution is available/preferred. To Tony's
    point though, on large scale failures enabling all links would cause
    heavy flood load, yes, but in a sense it's the "initial bootup" case
    anyway (especially in centralized case) since nodes need all
    topology to make informed correct decisions about what the FT should
    be if they don't rely on whatever the centralized instance thinks
    (which they won't be able to do given the FT from centralized
    instance will indicate lots links that are "gone" due to failure).
    As to p2p, I suggest to agree whether you use dense mesh (DC) case
    or sparse mesh (WAN) case or "every topology imaginable" since that
    drives lots design trade-offs.

    my 2.71828182 cents ;-)

    --- tony

    On Tue, Mar 5, 2019 at 8:27 AM Peter Psenak <[email protected]
    <mailto:[email protected]>> wrote:

        Hi Tony,

        On 05/03/2019 17:16 , [email protected] <mailto:[email protected]>
        wrote:
        >
        > Peter,
        >
        >>>    (a) Temporarily add all of the links that would appear to
        remedy the partition. This has the advantage that it is very
        likely to heal the partition and will do so in the minimal
        amount of convergence time.
        >>
        >> I prefer (a) because of the faster convergence.
        >> Adding all links on a single node to the flooding topology is
        not going to cause issues to flooding IMHO.
        >
        >
        > Could you (or John) please explain your rationale behind that?
        It seems counter-intuitive.

        it's limited to the links on a single node. From all the practical
        purposes I don't expect single node to have thousands of
        adjacencies, at
        least not in the DC topologies for which the dynamic flooding is
        being
        primary invented.

        In the environments with large number of adjacencies (e.g.
        hub-and-spoke) it is likely that we would have to make all these
        links
        part of the flooding topology anyway, because the spoke is
        typically
        dual attached to two hubs only. And the incremental adjacency
        bringup is
        something that an implementation may already support.

        >
        >
        >
        >> given that the flooding on the LAN in both OSPF and ISIS is
        done as multicast, there is currently no way to enable flooding,
        either permanent or temporary, towards a subset of the neighbors
        on the LAN. So if the flooding is enabled on a LAN it is done
        towards all routers connected to the it..
        >
        >
        > Agreed.
        >
        >
        >> Given that all links between routers are p2p these days, I
        would vote for simplicity and make the LAN always part of the FT.
        >
        >
        > I’m not on board with this yet.  Our simulations suggest that
        this is not necessarily optimal.  There are lots of topologies
        (e..g., parallel LANs) where this blanket approach is suboptimal.

        the question is how much are true LANs used as transit links in
        today's
        networks.

        thanks,
        Peter

        >
        > Tony
        >
        > .
        >

        _______________________________________________
        Lsr mailing list
        [email protected] <mailto:[email protected]>
        https://www.ietf.org/mailman/listinfo/lsr

    _______________________________________________
    Lsr mailing list
    [email protected] <mailto:[email protected]>
    https://www.ietf.org/mailman/listinfo/lsr


_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] Open issues with Dynamic Flooding

Reply via email to