Rusty Russell <ru...@rustcorp.com.au> writes: >> But only 18 000 pairs of channel updates carry actual fee and/or HTLC >> value change. 85% of the time, we just queried information that we >> already had! > > Note that this can happen in two legitimate cases: > 1. The weekly refresh of channel_update. > 2. A node updated too fast (A->B->A) and the ->A update caught up with the > ->B update. > > Fortunately, this seems fairly easy to handle: discard the newer > duplicate (unless > 1 week old). For future more advanced > reconstruction schemes (eg. INV or minisketch), we could remember the > latest timestamp of the duplicate, so we can avoid requesting it again.
Unfortunately this assumes that you have a single update partner, and still results in flaps, and might even result in a stuck state for some channels. Assume that we have a network in which a node D receives the updates from a node A through two or more separate paths: A --- B --- D \--- C ---/ And let's assume that some channel of A (c_A) is flapping (not the ones to B and C). A will send out two updates, one disables and the other one re-enables c_A, otherwise they are identical (timestamp and signature are different as well of course). The flush interval in B is sufficient to see both updates before flushing, hence both updates get dropped and nothing apparently changed (D doesn't get told about anything from B). The flush interval of C triggers after getting the re-enable, and D gets the disabling update, followed by the enabling update once C's flush interval triggers again. Worse if the connection A-C gets severed between the updates, now C and D learned that the channel is disabled and will not get the re-enabling update since B has dropped that one altogether. If B now gets told by D about the disable, it'll also go "ok, I'll disable it as well", leaving the entire network believing that the channel is disabled. This is really hard to debug, since A has sent a re-enabling channel_update, but everybody is stuck in the old state. At least locally updating timestamp and signature for identical updates and then not broadcasting if they were the only changes would at least prevent the last issue of overriding a dropped state with an earlier one, but it'd still leave C and D in an inconsistent state until we have some sort of passive sync that compares routing tables and fixes these issues. >> Adding a basic checksum (4 bytes for example) that covers fees and >> HTLC min/max value to our channel range queries would be a significant >> improvement and I will add this the open BOLT 1.1 proposal to extend >> queries with timestamps. >> >> I also think that such a checksum could also be used >> - in “inventory” based gossip messages >> - in set reconciliation schemes: we could reconcile [channel id | >> timestamp | checksum] first > > I think this is overkill? I think all the bolted on things are pretty much overkill at this point, it is unlikely that we will get any consistency in our views of the routing table, but that's actually not needed to route, and we should consider this a best effort gossip protocol anyway. If the routing protocol is too chatty, we should make efforts towards local policies at the senders of the update to reduce the number of flapping updates, not build in-network deduplications. Maybe something like "eager-disable" and "lazy-enable" is what we should go for, in which disables are sent right away, and enables are put on an exponential backoff timeout (after all what use are flappy nodes for routing?). Cheers, Christian _______________________________________________ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev