[Lightning-dev] Improving the initial gossip sync

Christian Decker Mon, 05 Feb 2018 05:03:18 -0800

Hi everyone

When we started drafting the specification we decided to postpone the
topology syncronization mechanism until we have a better picture of the
kind of loads that are to be expected in the network, e.g., churn and
update rate, and instead implement a trivial gossip protocol to
distribute the topology updates. This includes the dreaded initial
synchonization dump that has caused some issues lately to all
implementations, given that we dump several thousands of updates, that
may require block metadata (short channel ID to txid conversion) lookup
and a UTXO lookup (is this channel still active?).


During the last call we decided to go for an incremental improvement,
rather than a full synchronization mechanism (IBLT, rsync, ...). So
let's discuss how that improvement could look like.

In the following I'll describe a very simple extension based on a
highwater mark for updates, and I think Pierre has a good proposal of
his own, that I'll let him explain.

We already have the `initial_routing_sync` feature bit, which (if
implemented) allows disabling the initial gossip synchronization, and
onyl forwarding newly received gossip messages. I propose adding a new
feature bit (6, i.e., bitmask 0x40) indicating that the `init` message
is extended with a u32 `gossip_timestamp`, interpreted as a UNIX
timestamp. The `gossip_timestamp` is the lowest `channel_update` and
`node_announcement` timestamp the recipient is supposed to send, any
older update or announcement is to be skipped. This allows the `init`
sender to specify how far back the initial synchronization should go.

The logic to forward announcements thus follows this outline:

 - Set `gossip_timestamp` for this peer
 - Iterate through all `channel_update`s that have a timestamp that is
   newer than the `gossip_timestamp` (skipping replaced ones as per BOLT
   07)
 - For each `channel_update` fetch the corresponding
   `channel_announcement` and the endpoints `node_announcement`.
 - Forward the messages in the correct order, i.e.,
 - `channel_announcement`, then `channel_update`, and then `node_announcement`

The feature bit is even, meaning that it is required from the peer,
since we extend the `init` message itself, and a peer that does not
support this feature would be unable to parse any future extensions to
the `init` message. Alternatively we could create a new
`set_gossip_timestamp` message that is only sent if both endpoints
support this proposal, but that could result in duplicate messages being
delivered between the `init` and the `set_gossip_timestamp` message and
it'd require additional messages.

`gossip_timestamp` is rather flexible, since it allows the sender to
specify its most recent update if it believes it is completely caught
up, or send a slightly older timestamp to have some overlap for
currently broadcasting updates, or send the timestamp the node was last
connected with the network, in the case of prolonged downtime.

The reason I'm using timestamp and not the blockheight in the short
channel ID is that we already use the timestamp for pruning. In the
blockheight based timestamp we might ignore channels that were created,
then not announced or forgotten, and then later came back and are now
stable.

I hope this rather simple proposal is sufficient to fix the short-term
issues we are facing with the initial sync, while we wait for a real
sync protocol. It is definitely not meant to allow perfect
synchronization of the topology between peers, but then again I don't
believe that is strictly necessary to make the routing successful.

Please let me know what you think, and I'd love to discuss Pierre's
proposal as well.

Cheers,
Christian
_______________________________________________
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

[Lightning-dev] Improving the initial gossip sync

Reply via email to