Hi Gunter, I would like to counteract some of the Fear, Uncertainty, and Doubt that has misled you:
> Some concerns I see with a flooding leader approach: > Single Point of Failure (the flooding leader becomes a critical component in > the network. If the leader fails or becomes unreachable, it can disrupt the > flooding process until a new leader is elected or the network converges to an > alternative state) The leader is not a single point of failure, provided that the operator enables multiple potential leaders. Loss of a leader does not disrupt the flooding process. Flooding continues on the flooding topology until a new flooding topology is computed. > Increased Complexity (Implementing a flooding leader requires additional > mechanisms for leader election, maintenance, and failure detection) The leader election algorithm is taken directly from DIS election. If you have a LAN in your network, you are already running this algorithm. Leader failure detection falls out of SPF already: after SPF, if the leader is disconnected, then the local node needs to execute leader election. > Scalability Concerns (in large-scale or highly dynamic networks, managing a > single flooding leader can become a bottleneck) Management of the leader is trivial. The leader election is O(N) in the number of nodes in the LSDB. This is cheaper than SPF. If you can run SPF, then you can run leader election. I’ll note that all of the flooding topology computations that have been proposed are also more expensive than O(N), so if you can’t afford leader election, you can’t afford to run a flooding topology computation either. > Convergence Delays (when the flooding leader fails, the network must initiate > a leader re-election process) Leader failure does not affect convergence. Regardless of the leader’s status, flooding proceeds on the pre-computed flooding topology. Only after the topology change and subsequent SPF is the leader failure noticed and a new leader election run. That will subsequently generate a new flooding topology. There is always a valid flooding topology. > Lack of Redundancy (relying on a single leader reduces redundancy in the > flooding process) An area should never, ever have a single leader candidate, unless it is the only node in the area. An operator may configure every node to be a leader, if that is appropriate for the network. The redundancy in the flooding process is a result of redundancy in the flooding topology and has nothing whatsoever to do with the leader or leader election. The specific flooding topology is the result of the flooding algorithm selection. > Overhead of Leader Maintenance (continuous monitoring is required to ensure > the flooding leader is operational) The cost of leader maintenance is zero. SPF is already a sunk cost. As a side effect of normal SPF, the protocol infrastructure must already detect disconnected nodes and withdraw affected routes already. It is at this point in the code that a new leader election should be triggered, and not before. If the leader is still reachable, a good implementation does not need to execute a single solitary additional instruction. > Potential for Suboptimal Flooding Paths (the flooding leader may not always > have the most efficient paths to all nodes, especially in dynamic topologies) This has nothing to do with a leader and is solely a property of the flooding topology algorithm. The leader selects the algorithm, nothing more. Flooding is not done along paths, it is always hop-by-hop. If an algorithm chooses poorly, then yes, the flooding topology can be wildly sub-optimal. An operator should spend SERIOUS amounts of effort understanding how any given algorithm behaves in their network. An algorithm that produces a flooding topology that is a giant cycle with a latency of 500ms will impact convergence. An algorithm that causes a single weak node to have an undue burden will also affect convergence. An algorithm that does not produce a bi-connected topology will impact network resilience. I could go on. > Complex Recovery Mechanisms (recovering from leader failures may involve > complex procedures that differ from standard link-state protocol operations) The recovery mechanism from leader failure is to elect a new leader. This is not complex, it is a linear scan of the nodes in the area looking for those nodes that are eligible and looking at their advertised priorities. It is exactly analogous to DIS election. It takes less than a millisecond. > I believe there is place for both a flooding leader and leaderless > architecture. It depends upon type of network where this is implemented (for > example Datacenter or Service Provider WAN). I have no issues with a leaderless architecture, if we can actually demonstrate one that will preserve network reliability, stability, and performance. So far, I haven’t seen one other than ubiquitous configuration. I fail to see any distinction between DC or WAN for this. A dense network is a dense network. Tony
_______________________________________________ Lsr mailing list -- [email protected] To unsubscribe send an email to [email protected]
