*From:* Tony Li [mailto:[email protected]] *On Behalf Of
*[email protected]
*Sent:* Wednesday, March 6, 2019 10:45 AM
*To:* Huaimo Chen <[email protected]>
*Cc:* Christian Hopps <[email protected]>; [email protected];
[email protected]; [email protected]
*Subject:* Multiple failures in Dynamic Flooding
Hi Huaimo,
I’m sorry that you don’t find it useful. Determining the split is
trivial: when you receive an IIH,
it has a system ID of the another system in it. If that other system is
not currently part of the
flooding topology, then it is quite clear that it is disconnected from
the flooding topology.
Repairing the split is done by enabling temporary flooding on the new link.
For an adjacency between two nodes is up, the Hello packets exchanged
between them will not change node/system IDs in them.
How do you determine that other system is not currently part of the
flooding topology?
The IIH includes the system ID. See ISO 10589 v2, section 9.7, field
“source Id”. The local system will have
a copy of the flooding topology and can easily see if the neighbor was
present as of the last FT computation. If not, then it should be
added (modulo rate limiting). The local system can also examine it’s own
LSDB. If there is no LSP for the neighbor, then it would seem
highly likely that there is a disconnect and the neighbor should again
be added (modulo rate limiting).
We are not requiring it, but a system could also do a more extensive
computation and compare the links between itself and the neighbor
by tracing the path in the FT and then confirming that each link is up
in the LSDB.
It normally takes a long time such as more than ten minutes to age out
and remove an LSP/LSA for the neighbor from the LSDB even though the
neighbor is disconnected physically.
How can you decide quickly in tens of milliseconds that the flooding
topology is disconnected?
There is an issue here that we have not yet resolved, which is the rate
that new links should be
temporarily added to the flooding topology. Some believe that adding
any new link is the
correct thing to do as it minimizes the recovery time. Others feel that
enabling too many links
could cause a flooding collapse, so link addition should be highly
constrained. We are still
discussing this and invite the WG’s opinions.
The issue is resolved by the solutions in draft-cc-lsr-flooding-reduction.
One solution is below, where the given distance can be adjusted/configured.
If we want every node to flood on all its links, we let the given
distance to a big number. If we want the nodes within 2 hops to a failure
to flood on all their links, we set the given distance to 2.
“In one way, when two or more failures on the current flooding
> >topology occur almost in the same time, each of the nodes within a
> >given distance (such as 3 hops) to a failure point, floods the link
> >state (LS) that it receives to all the links (except for the one from
which the LS is received) until a new flooding topology is built.”
As we have discussed, this is not a solution. In fact, this is more
dangerous than anything else that has been proposed and
seems highly likely to trigger a cascade failure. You are enabling full
flooding for many nodes. In dense topologies, even
a radius of 3 is very high. For example, in a LS topology, a radius of
3 is sufficient to enable full flooding throughout the
entire topology. If that were stable, we would not need Dynamic Flooding
at all.
This full flooding is enabled only for a very short time.
How do you get that this is more dangerous than anything else and seems
highly likely to trigger a cascade failure? Can you give some
explanations in details?
Another solution is just adding minimum links temporarily on the flooding
topology to repair the split flooding topology until a new flooding
topology
is built.
Agreed. Which links constitute the minimum? In a general topology,
with arbitrary failures that are not distributed globally,
how do we make a distributed decision about which links to enable? This
is the problem that we are trying to solve. And
we have no oracle to tell us The Right Answer.
We can discuss this after the first method is discussed.
Best Regards,
Huaimo
Regards,
Tony
_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr