Re: [Lsr] Min Links for Multiple Failures on Flooding Topology

David Allan I Thu, 18 Apr 2019 11:29:00 -0700

Hi Huaimo:


Thanks for replying, my comments in line prefaced with DA2>

 

<snipped> 

1.      The alternate backup path would appear to also require the criteria of 
being link diverse with the FT if the goal is to protect against multiple 
failures.

 

[HC]: Can you give some more details about this?

[DA] There is a bit of a chain of logic I did not well elucidate…

If we have an FT sufficient to be complete in the presence of any single 
failure, AND if we have a multiple failures situation such that the FT has been 
partitioned, and the information at any node is incomplete, then IMO the 
heuristic to attempt a blind repair with the highest probability of success is 
to 

a.      Assume any observed failure is the worst possible class of failure 
(e.g. node, as if the FT is severed the surrounding nodes will only see one or 
some of the LSAs associated with the node failure).
b.      Attempt to restore using links that are not part of the FT as if I 
assume the probability of multiple failures decreases exponentially in 
proportion to the number of simultaneous failures, it has a higher probability 
of success….

On reflection ‘b’ seems too simplistic, and does not reflect that some 
knowledge of what parts of the FT survive in the partition the node 
contemplating restoration is in, would be available for decision making.  And 
the fact that the concept of path as a response to the failure scenario being 
discussed IMO is not realistic (I elaborate a bit below).

 

[HC2]: Consider the case that there are failures of two links and these two 
links are on the FT and the FT is split by these two failures and no other 
failures follow shortly. In this case, every node has the information about the 
normal/real network topology before the failures, and the indications about 
these two failures. Based on this information, the backup paths for these two 
failures can be computed and enabled for temporary flooding. Thus the FT 
partition is fixed. 

DA2> Frankly, I cringe a bit at this characterization. Part of this is that:

*       there isn’t a way to instantiate a path, and IMO to maximize coverage 
against all possible scenarios it should be a unilateral action by a single 
node for each direction across the partition, not something that requires 
coordinated action by a number of potentially unsynchronized nodes
*       the statement “no other failures follow shortly” IMO is wishful thinking
*       the nodes on either side of a partition are all FT participants so what 
is actually needed is a node to self elect to flood information on a non-FT 
adjacency that by some heuristic it believes crosses the chasm….which then 
starts really looking like “adding links until the partition(s) is/are fixed”….

So while thinking a path can solve a partition, as there is no method of 
signaling it, or reliably depending on a distributed solution to create it, it 
therefore it would have to be composed of FT adjacencies, with the exception of 
the one that crosses the partition.  Which collapses down to the case under 
consideration and would be the only link that mattered.  

 

So IMO at best what you propose may a heuristic that would be a guideline for 
the first try to add a link but is not a complete solution in itself.

 

If the backup path for one failure on the FT does not share any link with the 
backup path for the other failure on the FT (Note that this can be achieved 
through using diverse links when computing backup paths), the backup paths can 
also survive the failure of any other link that is not on the FT and happens at 
almost the same time as the two failures of the two links on the FT.  

Consider the case that there is another failure of one link on the FT in 
addition to these two failures. This failure is on one partition of the FT and 
not on the other partition. In this case, the information about the other 
failure will flood through the backup paths. Moreover, the backup path for the 
other failure is computed and enabled for temporary flooding in its partition. 
The information about the two failures will food through this backup path. Thus 
the topology is converged quickly.  

In order to speed up the convergence further, we may increase the connectivity 
on the FT when there are failures on the FT to let every node have more 
information about failures. In one way, when a link attached to a node fails 
and the link is on the FT, we find another link attached to the node and not on 
the FT, and add it to the FT temporarily. In another way, only after two or 
more failures on the FT happen, for a link attached to a node that is on the FT 
and fails, we find another link attached to the node and not on the FT, and add 
it to the FT temporarily.

DA2>  I think we are talking past each other, as I see the problem as one of 
sets once partitioning occurs. I need a means of reconnecting disconnected sets 
of FT members, where within a given set there is full connectivity and in a 
reasonably rich topology there would be any number of candidate links that 
could be added that would eliminate the partitions. Paths is “a” way to think 
of this, but as noted above, instantiating one will by necessity reduce to 
adding a link to the FT, so it would merely be a guideline for selecting a link 
with a high probability of reconnecting severed sets in the presence of 
incomplete information.

 

 

2.      If node failures are considered, I’m not sure what criteria is used to 
deem a backup path as useful…..

 

[HC]: Regarding to the failure of a node X on the FT, suppose that there are 
multiple (i.e., two or more) nodes that were connected to the failed node X 
through the links on the FT. For each pair of these multiple nodes, a backup 
path between this pair is computed and enabled for temporary flooding. Thus the 
backup paths will connect these multiple nodes on the FT, and the FT partition 
caused by multiple failures including the failure of node X is fixed through 
the backup paths for the failed node X and the backup paths for the other 
failures. 

 

For example, if the failed node X was connected to two nodes Ri and Rj (assume 
that Ri’s ID < Rj’s ID) by the links on the FT before node X fails, there is 
only one pair of nodes: (Ri, Rj). A unique backup path from Ri to Rj is 
computed and enabled for temporary flooding. This backup path will connect Ri 
and Rj on the FT and fix the FT partition caused by multiple failures with the 
backup paths for the other failures.

 

In another example, if the failed node X was connected to three nodes Ri, Rj 
and Rk (assume that Ri’s ID < Rj’s ID < Rk’s ID) by the links on the FT before 
node X fails, there are three pairs of nodes: (Ri, Rj), (Ri, Rk) and (Rj, Rk). 
A unique backup path from Ri to Rj, a unique backup path from Ri to Rk, and a 
unique backup path from Rj to Rk are computed and enabled for temporary 
flooding. These three backup paths will connect three nodes Ri, Rj and Rk on 
the FT, and fix the FT partition caused by multiple failures with the backup 
paths for the other failures.

 

DA> Again I need to back this up a bit and incorporate a bit more subsequent 
reflection in my response.  What I was referring to blurred two discussions, 
adding links in response to severing and your post where path establishment 
seemed to be based on a previously known network state.

 

As observed above, I do not think a restoration strategy focused on a repair 
path that assumes a link failure will do anything useful for the partitioning 
scenario under consideration.  I also do not see a simple heuristic for a 
collection of nodes that are blind to the overall state of the FT to create a 
new path in the FT as a distributed response and where no signaling is 
involved.  I’d assume that is why what is being discussed is to add links 
temporarily as that is about the only strategy that can work with unilateral 
decisions by single nodes not acting in a coordinated fashion….If a path is 
required, a node trying to instantiate a portion of the path cannot depend on 
its neighbor to independently come to the same conclusion, in fact for an 
actual repair the opposite is just about guaranteed.  

 

[HC2]: The backup paths are computed based on the information there. Even 
though some information may be missing, the backup paths for a failure are 
created by the nodes close (or local) to the failure and connect two 
partitions. Through locally repairing the partitions and iterations, the 
topology is converged quickly. 

I considered to use signaling to setup backup paths. However, this is 
complicated since we may introduce a simplified tunneling protocol into IGP.  
In an IP network,  when an IP packet with a destination D is sent from any 
node, this packet will travel to D along a path. This path is “created” as a 
distributed response to topology changes and no signaling is used. 

DA2> Well you’ve now seen my comments above on instantiation and utility of 
paths. Tunneling is a solution I had not considered, and I’m insufficiently an 
expert to consider all the ramifications, such as does that tunnel become an 
adjacency and forward non IGP traffic… etc. (like an MPLS FA)…And I again 
cringe and the security implications of accepting any control plane packet that 
randomly arrived on my doorstep.

 

I hope this helps

Dave

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] Min Links for Multiple Failures on Flooding Topology

Reply via email to