[homenet] More about desynchronisation [was: Why configuration and routing are separate]

Juliusz Chroboczek Sat, 23 Jul 2016 05:39:43 -0700

> Having these two protocols knowing nothing about each other and each
> others' state is potential source of problems.


I've thought about it some more (warm showers increase the flow of blood
to my brain), and I've realised that this is an important point, and one
that is close to my research interests.  So please let me ramble some more.

Given a distributed system (such as, say, a network of IS-IS nodes, or
a network of Kademlia nodes, or a network of HNCP + Babel nodes), in
transitory situtations the different states will temporarily be
inconsistent.  For example, the LS databases of OSPF nodes will
temporarily desynchronise, a node will leave the Kademlia network without
notifying its neighbours, or HNCP and Babel will have different ideas
about an adjacency being up or down.  I believe this is a law of nature --
but I haven't been able to find the theorem that says that.

(Aside: I'm positive there's a theorem that says that distributed consensus
cannot be achieved by a symmetric, deterministic algorithm, but
disappointingly enough this doesn't apply to networking protocols which use
random backoff (non-deterministic) or break ties by picking the larger
node-id (broken symmetry).  End of aside.)

It is generally considered ungentlemanly to attempt to change the laws of
nature, so people have taken two approaches to work around the problem:

  (1) minimise the time window during which nodes are desynchronised;
  (2) minimise the consequences of desynchronisation.

To give an example of approach (1), link-state routing protocols have been
carefully engineered to reflood in a timely and reliable manner -- this
aims to minimise the time during which the network is pushing packets in
the wrong direction.  Examples of (2) include Kademlia, which is designed
to survive large numbers of broken nodes (our measurements indicate that
40 to 60% of nodes in the BitTorrent DHT are broken), and protocols such
as EIGRP, DSDV or Babel, which are designed to keep pushing packets in the
right direction following loop-free paths even during reconvergence.

Note that the two approaches are not exclusive -- just because Kademlia
can deal with broken nodes doesn't mean we should be encouraging
brokenness, and just because Babel is loop-avoiding doesn't mean we should
be slowing down its convergence.  (I am not aware of any work about
transient behaviour of link-state protocols, perhaps the OLSR community
has done something related?)

I don't think it has ever been stated explicitly, but my view is that (2)
is what we are doing with HNCP + Babel: allowing the two protocols to
transiently have different ideas about adjacencies, and making sure that
nothing important breaks when that happens.  Of course, just because
nothing breaks doesn't mean that it's a desirable situation, so adding
(optional) mechanisms to minimise the likelihood of this happening is
something we should be considering.  I'm not sure that Henning's
suggestion is worth the implementation cost, however.

-- Juliusz

_______________________________________________
homenet mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/homenet

[homenet] More about desynchronisation [was: Why configuration and routing are separate]

Reply via email to