On Tue, 8 Jan 2019 at 17:11, Christian Decker
<decker.christ...@gmail.com> wrote:
> Rusty Russell <ru...@rustcorp.com.au> writes:
> > Fortunately, this seems fairly easy to handle: discard the newer
> > duplicate (unless > 1 week old).  For future more advanced
> > reconstruction schemes (eg. INV or minisketch), we could remember the
> > latest timestamp of the duplicate, so we can avoid requesting it again.
> Unfortunately this assumes that you have a single update partner, and
> still results in flaps, and might even result in a stuck state for some
> channels.
> Assume that we have a network in which a node D receives the updates
> from a node A through two or more separate paths:
> A --- B --- D
>  \--- C ---/
> And let's assume that some channel of A (c_A) is flapping (not the ones
> to B and C). A will send out two updates, one disables and the other one
> re-enables c_A, otherwise they are identical (timestamp and signature
> are different as well of course). The flush interval in B is sufficient
> to see both updates before flushing, hence both updates get dropped and
> nothing apparently changed (D doesn't get told about anything from
> B). The flush interval of C triggers after getting the re-enable, and D
> gets the disabling update, followed by the enabling update once C's
> flush interval triggers again. Worse if the connection A-C gets severed
> between the updates, now C and D learned that the channel is disabled
> and will not get the re-enabling update since B has dropped that one
> altogether. If B now gets told by D about the disable, it'll also go
> "ok, I'll disable it as well", leaving the entire network believing that
> the channel is disabled.
> This is really hard to debug, since A has sent a re-enabling
> channel_update, but everybody is stuck in the old state.

I think there may even be a simpler case where not replacing updates
will result in nodes not knowing that a channel has been re-enabled:
suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables
it, U3 enables it again and is the same as U1. If you discard it and
just keep U1, and your peer has U2, how will you tell them that the
channel has been enabled again ? Unless "discard" here means keep the
update but don't broadcast it ?

> At least locally updating timestamp and signature for identical updates
> and then not broadcasting if they were the only changes would at least
> prevent the last issue of overriding a dropped state with an earlier
> one, but it'd still leave C and D in an inconsistent state until we have
> some sort of passive sync that compares routing tables and fixes these
> issues.

But then there's a risk that nodes would discard channels as stale
because they don't get new updates when they reconnect.

> I think all the bolted on things are pretty much overkill at this point,
> it is unlikely that we will get any consistency in our views of the
> routing table, but that's actually not needed to route, and we should
> consider this a best effort gossip protocol anyway. If the routing
> protocol is too chatty, we should make efforts towards local policies at
> the senders of the update to reduce the number of flapping updates, not
> build in-network deduplications. Maybe something like "eager-disable"
> and "lazy-enable" is what we should go for, in which disables are sent
> right away, and enables are put on an exponential backoff timeout (after
> all what use are flappy nodes for routing?).

Yes there are probably heuristics that would help reducing gossip
traffic, and I see your point but I was thinking about doing the
opposite: "eager-enable" and "lazy-disable", because from a sender's
p.o.v trying to use a disabled channel is better than ignoring an
enabled channel.

Lightning-dev mailing list

Reply via email to