[Lightning-dev] LN Summit 2022 Notes & Summary/Commentary

Olaoluwa Osuntokun Tue, 07 Jun 2022 19:39:09 -0700

Hi y'all,

Last week nearly 30 (!) Lightning developers and researchers gathered in
Oakland, California for three day to discuss a number of matters related to
the current state and evolution of the protocol.  This time around, we had
much better representation for all the major Lightning Node implementations
compared to the last LN Dev Summit (Zurich, Oct 2021).

Similar to the prior LN Dev Summit, notes were kept throughout the day that
attempted on a best effort basis to capture the relevant discussions,
decisions, and new relevant research or follow up areas to circle back on.
Last time around, I sent out an email that summarized some key takeaways
(from my PoV) of the last multi-day dev summit [1]. What follows in this
email is a similar summary/recap of the three day summit. Just like last
time: if you attended and felt I missed out on a key point, or inadvertently
misrepresented a statement/idea, please feel free to reply, correcting or
adding additional detail.

The meeting notes in full can be found here:
https://docs.google.com/document/d/1KHocBjlvg-XOFH5oG_HwWdvNBIvQgxwAok3ZQ6bnCW0/edit?usp=sharing

# Simple Taproot Channels

During the last summit, Taproot was a major discussion topic as though the
soft fork had been deployed, we we're all still watching the 🟩 's stack up
on the road to ultimately activation. Fast forward several months later and
Taproot has now been fully activated, with ecosystem starting to
progressively deploy more and more advanced systems/applications that take
advantage of the new features.

One key deployment model that came out of the last LN Dev summit was the
concept of an iterative roadmap that progressively revamped the system to
use more taprooty features, instead of a "big bang" approach that would
attempt to package up as many things as possible into one larger update. At
a high level the iterative roadmap proposed that we unroll an existing
larger proposal [2] into more bite sized pieces that can be incrementally
reviewed, implemented, and ultimately deployed (see my post on the LN Dev
Summit 2021 for more details).

## Extension BOLTs

Riiight before we started on the first day, I wrote up a minimal proposal
that attempted to tackle the first two items of the Taproot iterative
deployment schedule (musig2 funding outputs and simple tapscript mapping)
[3]. I called the proposal "Simple Taproot Channels" as it set out to do a
mechanical mapping of the current commitment and script structure to a more
taprooty domain. Rather than edit 4 or 5 different BOLTs with a series of
"if this feature bit applies" nested clauses, I instead opted to create a
new standalone "extension bolt" that defines _new_ behavior on top of the
existing BOLTs, referring to the BOLTs when necessary. The style of the
document was inspired by the "proposals" proposal (very meta), which was
popularized by cdecker and adopted by t-bast with his documents on
Trampoline and Blinded Paths.

If the concept catches on, extension BOLTs provide us with a new way to
extend the spec: rather than insert everything in-line, we could instead
create new standalone documents for larger features. Having a single self
contained document makes the proposal easier to review, and also gives the
author more room to provide any background knowledge, primaries, and also
rationale. Overtime, as the new extensions become widespread (eg: taproot is
the default channel type), we can fold in the extensions back to the main
set of "core" BOLTs (or make new ones as relevant).

Smaller changes to the spec like deprecating an old field or tightening up
some language will likely still follow the old approach of mutating the
existing BOLTs, but larger overhauls like the planned PTLC update may find
the extension BOLTs to be a better tool.

## Tapscript, Musig2, and Lightning

As mentioned above the Simple Taproot Channels proposal does two main
things:
1. Move the existing 2-of-2 p2wsh segwit v0 funding output to a _single
key_ p2tr output, with the single key actually being an aggregated musig2
key.

2. Map all our existing scripts to the tapscript domain, using the
internal key (keyspend path) for things like revocations, which an
potentially allow nodes to store less state for HTLCs.

Of the two components #1 is by far the trickiest. Musig2 is a very elegant
protocol (not to mention the spec which y'all should totally check out) but
as the signatures aren't deterministic (like RFC 6979 [5]), both signers
need to "protect themselves at all times" to ensure they don't ever re-use
nonces, which can lead to a private key leak (!!).

Rather than try to create some sort of psuedo-deterministic nonces scheme
(which maaybe works until the Blockstream Research team squints vaguely in
its direction), I opted to just make all nonces 100% ephemeral and tied to
the lifetime of a connection. Musig2 defines something called a public
nonces, which is actually two individual 33-byte nonces. This value needs to
be exchanged before signing can begin (but can be sent before sides know
they're aggregated keys). One important thing to note is that given that the
channels today have _asymmetric_ state, we actually need a _pair_ of public
nonces: one that I'll use to sign my commitment, and one I'll use to sign
yours. Lightning channels w/ symmetric state like eltoo can get by w/ only
exchange a single set of nonces, as there's only one message per state.

Nonce exchange takes place in a few places:

* During initial funding: I send my public nonce in the open_channel
message, you send yours in the accept_channel message. After this
exchange we can both generate signatures for the refund commitment
transactions.

* After the channel is "ready" we send another set of nonces, so we can
sign the next state. This is similar to the existing revocation key
exchange: I need your next nonce/key before I can sign a new state.

* Upon channel re-establishment a _new_ set of nonces is sent, as they're
100% ephemeral. The current draft also requires that if you were
re-transmitting a sig, then you use the _new_ nonces to sign again, as
it's possible you went to retransmit but left off an expired/trimmed
HLTC (could lead to nonce re-use and also needing to remember nonces).

* Each time I revoke my channel, I send to you a single nonce, my "local
nonce" (naming needs some work here), which lets you sign for a new
state.

* Each time I send a new sig, I also send you another nonce, my "remote"
nonce", which

* When I send a shutdown (co-op close) I send a single public nonce so we
can sign the next co-opc close offer.

* When I send a closing_signed I send another nonce so once you send your
offer, we sign another set.

The final flows aren't 100% yet finalized, as we'll need some
implementations drafted to make sure the nonce handling and script mapping
works out properly.

### Lightning Channels & Recursive Musig2

One other cool topic that came up is the concept of leveraging recursive
musig2 (so musig2 within musig2) to make channels even _more_ multi-sigy.
The benefit here is that Bob & Carol can each have their individual keys
(which might actually be aggregated keys themselves) and make a channel w/
Alice, who only knows of them as Barol, and doesn't know there're actually
another pair of keys at play. This is _really_ cool as it allows node
operators, wallets, and lightning platforms to experiment with various
key/signing trees that may add more security, redundancy, or flexibility.

When this first came up, someone brought up the fact that while the scheme
is "known" the initial paper as they weren't sure how to actually write a
proof for it. During the session, someone emailed one of the musig2 authors
asking for more details, and if it's safe to implement and roll out.
Thankfully they quickly replied and explained that the proof recursive musig
(pls someone correct me again here if I'm wrong) wasn't left out due to
impossibility, but that a proof in the existing Random Oracle Model (which
was used to derive a bound for the number of nonces needed) would lead to a
blow up in the number of nonces required. Attempting to write the proof in
some other model would likely lead to better results (proved w/ two nonces
as base musig2), but would end up being pretty complicated, so hard to read
and even review for correctness.

Assuming everything checks out, then a useful mental model explained by the
musig2 BIP author is a sort of tree structure. Assuming I'm a signer, and we
assemble the other signer as a sibling leaf in a binary tree, then I just
need to wait for the sibling nonce/key, before I can aggregate that into
the final value. So if there're 3 signers, I wait for the regular public
nonce, but the other signers sum their respective nonces into a single
nonce, then send that to me. A similar operation is carried out for key
aggregation, with the rest of the protocol being mostly the same.

Ultimately, even if wallets/nodes aren't ready to roll something like this
out today, we at least want to make sure the proposed flow is compatible
with Simple Taproot Channels, and ideally we'd have a toy implementation to
verify out understanding and show it's possible/sound. I volunteered to hack
up a simple recursive musig2 demo, as there doesn't seem to be any code in
the wild that implements it.

## Lightning Gossip

# Gossip V2: Now Or Later?

Another big topic related to Taproot was the question of how we should
update the gossip network: the gossip protocol today has all channels
validated by node, which requires that the nodes understand how to
reconstruct the funding output based on the set of advertised keys. The
protocol today assumes a segwit v0 p2wsh multi-sig is used. Assuming we had
everything implemented today, a node wouldn't be able to advertise its new
taproot channels to the rest of the public graph as they wouldn't understand
how to validate it.

This presents a new opportunity: we already need to rework gossip for
taproot, so should we go ahead and re-design the entire thing with an eye
for better privacy and future extensibility?

A proposal for the "re-design the entire thing" was floated in the past by
Rusty [6]. It does away with the strict coupling of channels to channel
announcements, and instead moves them to the _node_ level. Each node would
then advertise the set of "outputs" they have control of, which would then
be mapped to the total capacity of a node, without requiring that these
outputs self identify themselves on-chain as Lightning Channels. This also
opens up the door to different, potentially more privacy preserving
proofs-of-channel-ownership (something something zkp).

On the other hand, we could just follow the path of Simple Taproot Channels
and map musig2+schnorr onto the existing gossip network. This is less
changes in total, with the main benefit being the ability to only send 1 sig
(aggregated musig2 sig of keys) instead of 4 individual sigs. I made a very
lofty proposal in this direction here [7].

Ultimately we decided to take the "just musig2 aspects" from gossip v1.5
(not the real name), and the "let's refresh all the messages w/ TLV
goodness" from the gossip v2 proposal. This gives us a smaller package to
implement, and lets us potentially rejigger the messages to be more
extensible and remove cruft like the node color that almost nothing uses,
but we all validate/store.

The follow up work in this area is a more concrete proposal that updates the
relevant gossip messages to be taproot aware and TLV'd and also update the
set of requirements w.r.t _how_ to validate the channels in the first place
(so given two keys verify that applying the keyagg method of musig2 lead to
what' in the funding output).

Gossip v2 will likely happen "eventually", but the rather large design space
needs to be explored a bit more so we can properly analyze exactly what
privacy and extensibility properties we'll get out of it.

# Applying Mini Sketch to LN Gossip

One issue we have today, is that other than the initial scid query mechanism
added to the protocol, there isn't a great way to ensure you have all the
latest updates your peer has. These days, many nodes pretty aggressively
rate limit other nodes, so you might even have trouble sending out your
update in the first place. A recent paper (that I haven't actually fully
read yet) [8] analyzes the gossip network today to work out things like:
exactly how long it takes things to propagate, total bandwidth usage, etc.

Minisketch [9] (the grandchild of IBLTs ;)), is an efficient set
reconciliation protocol that was designed for Bitcoin p2p mempool syncing,
but can be applied to other protocols. An attendee has been working on
brushing off some older work to try to see how we could apply it to the LN
protocol to give nodes a more bandwidth efficient way to sync channel
updates, and also achieve better update propagation. This supplements some
existing investigative work done by Alex Meyers [10], with more concrete
designs w.r.t: what goes into the sketch, and the various size parameters
that need to be chosen.

# Channel Jamming

An attendee gave a talk on the various proposed solutions to channel
jamming, evaluating them on several axis including: punishment/monetary,
local vs global reputation, feasibility of mechanism design, UX
implications, and implementation complexity. The presenter didn't present a
new concrete proposal, but instead went through the various trade-offs,
ultimately concluding that they factor monetary penalties wherein the funds
are distributed across the route, rather than being provably burnt to
miners. However they alluded to some future upcoming work that attempts a
more rigorous analysis of the proposed solutions, their tradeoffs, and
potential ways we can parametrize solutions to be more effective (how much
should they pay, etc).

For those looking to brush up on the latest state of research/mitigations in
this area, I recommend this blog post by Bitmex research [11].

# Onion Messages & DoS

The topic of DoS concerns related to onion messages (in isolation, so not
necessarily related to things like bolt12 that take advantage of them came
up. During a white boarding session some argued that DoS isn't actually
much of an issue, as nodes can leverage "back propagation congestion
control" to inform the source (who may not actually be the sender) that
they'll start to drop or limit their packets, with each node doing this
iteratively until the actual source of the spam has been clamped. A few
lofty designs were thrown around, but more work needs to be done to
concretely specify something so it can be properly analyzed.

On the other side of the spectrum, rather than attempt to rate limit at the
node level (which each node having their own policy), nodes could opt
instead to forward _anything_ as long as the sender pays them enough. I
proposed a lofty approach that combined AMP and Onion Messages earlier this
year [12]. At a high level I make an AMP payment, which pushes extra coins
to all nodes on a route, and also drops off a special identifier to them.
When I send an onion message I include this identifier, with each node
performing their own account w.r.t the amount of bandwidth an ID has
remaining.

Ultimately a few implementations are pretty close to deploying their
implementation of onion messages, so no matter the intended use case, it
would be good to have code deployed along side to either rate limit or price
resource consumption accordingly. Otherwise, we might end up in a scenario
where DoS concerns were brushed aside, but end up being a huge issue later.

# Blinded Paths, QR Codes & Invoices

Blinded paths [13] is a new-er proposal to solve the "last mile" privacy
issue when receiving payments on LN. Today invoices to unadvertised channels
contain a set of hop hints, which are anchored at public nodes in the graph,
and also leak the scid of the unadvertised channel (points on-chain to the
channel receiving payments). A solution for the on-chain leak, SCID channel
aliases [15] are in the process of being widely rolled out. Channel aliases
instead use a random value in the invoice, allowing receiving nodes to break
that on-chain link and even rotate out the value periodically. With the
on-chain leak addressed, it's still the case that you give away your
"position" in the network, since as a sender I know that you're connected to
node N with a private channel.

Blinded paths address this node-level last mile privacy leak by replacing
hop hints with a new cryptographically blinded path. At a high level, the
receiver can construct a "hop hint" of length greater than 1, gather the
public keys of each of the nodes, then blinded them such that: the sender
can use them for path finding, but doesn't actually now exactly _which_
nodes they actually are.

There're two type of blinded paths: those in onion messages and those used
for actual payments. The latter variant was only formalized earlier this
year, as before people were mainly interested in using them to fetch BOLT 12
invoice via onion messages. One issue that pops up when attempting to use
blinded paths for normal payments is: the size of the resulting invoice. As
blinded paths are actually fragments of publicly known paths, as a receiver,
you want to stuff as many of them into the invoice as possible, since they
MUST be taken in order to route towards you. Invoices are typically
communicated via QR codes, which have a hard limit w.r.t the amount of
information that can be packed in. On the other hand for invoice fetching,
all that matters is that a path exists, so you can get by with stuffing less
of then in a QR code.

As a result, blinded paths aren't necessarily compatible with the widely
deployed BOLT 11 based QR codes. Instead a way to fetch invoice on demand is
required. Both BOLT-12 and LN-URL provide standardized ways for nodes to
fetch invoices, though their transport/signalling medium of choice differs.
Blinded routes are technically compatible with BOLT 11 invoices, but may be
hampered by the fact that you can only include so many routes.

Another consideration is that unlike hop hints, blinded paths require more
maintain once, as since they traverse public route, policy changes like a
fee update may invalidate an entire set set of routes. One proposed solution
is that forwarding nodes should observe their older policy for a period of
time (so a grace period), and also that blinded paths should have an
explicit expiry (similar to the existing invoice expiry).

One other implication is that the set of routes the receiver includes
matters
more: if they don't send enough or select them poorly, the sender may never
be
able to reach them even though a path exists in theory. More hands on
experience is needed so the spec authors can better guide implementations
and
wallets w.r.t best practices.

# Friend-of-a-friend Balance Sharing & Probing

A presentation was given on friend-of-a-friend balance sharing [16]. The
high level idea is that if we share _some_ information within a local
radius, then this gives the sender more information to choose a path that's
potentially more reliable. The tradeoff here ofc is that nodes will be
giving away more information that can potentially be used to ascertain
payment flows. In an attempt to minimize the amount of information shared,
the presenter proposed that just 2 bits of information be shared. Some
initial simulations showed that sharing local information actually performed
better than sharing global information (?). Some were puzzled w.r.t how
that's possible, but assuming the slides+methods are published others can
dig further into the model/parameter used to signal the inclusion.

Arguably, information like this is already available via probing, so one
line of thinking is something like: "why not just share _some_ of it" that
may actually lead to less internal failures? This is related to a sort of
tension between probing as a tool to increase payment reliability and also
as a tool to degrade privacy in the network. On the other hand, others
argued that probing provides natural cover traffic, since they actually
_are_ payments, though they may not be intended to succeed.

On the topic of channel probing, a sort of makeshift protocol was devised to
make it harder in practice, sacrificing too much on the axis of payment
reliability. At a high level it proposes that:

* nodes more diligently set both their max_htlc amount, as well as the
max_htlc_value_in_flight amount

* a 50ms (or select other value) timer should be used when sending out
commitment signatures, independent of HTLC arrival

* nodes leverage the max_htlc value to set a false ceiling on the max in
flight parameter

* for each HTLC sent/forwarded, select 2 other channels at random and
reduce the "fake" in-flight ceiling for a period of time

Some more details still need to be worked out, but some felt that this would
kick start more research into this area, and also make balance mapping
_slightly_ more difficult. From afar, it may be the case that achieving
balance privacy while also achieving acceptable levels of payment
reliability might be at odds with each other.

# Eltoo & ANYPREVOUT

One of the attendees is currently working on both fully implementing eltoo,
as well as specifying the exact channel funding+update interaction were it
to be rolled out align side the existing penalty based channels in the
protocol. As this version of eltoo is based on Taproot, we were able to
compare notes a bit to find the overlapping set of changes (nonce handling,
etc), which permits cross review of the proposals. This type of work is
cool, as only by fully implementing something end to end can you reaaally
work out all the edge cases and nuances.

ANYPREVOUT as hasn't changed significantly as of late. An attendee shared
plans to create a sort of mega all-future-feasible-soft-forks fork of
bitcoind, that would package up various unmerged (from bitcoind's) proposal
soft fork packages into an easy to run+install binary/project attached to a
signet. The hop is that by giving developers an easy way to interact with
proposed soft fork proposals (vs debasing some ancient pull request), wider
participation in testing/implementation/review can be facilitated.

# Trampoline Routing

There was a presentation on Trampoline routing explaining the motivation,
history, and current state of the proposal. The two main cases we've
narrowed down on are:

1. A mobile user doesn't necessarily want to sync the _entire_ graph, so
they can use trampoline to maintain a subset and still be able to send
payments.

2. A mobile user wants to be able to instate a payment, go offline, and
return at a later time to learn about the final state of the payment.

Use case #2 seems to be the most promising when combined with other
proposals for holding HTLCs at an origin node (call it an "LSP") [13].
Combined together, this would allow a mobile node to send a payment, then go
offline, with the LSP being able to retry the payment either continuously or
only when it knows the receiver is online to accept the payment. This may
potentially dramatically improve the UX for LN on mobile, as things suddenly
become a lot more asynchronous: I do something go offline, and the LSP node
can fulfil the payment in the background, then wait for me to come online to
settle the final. hop.

Trampoline can also be composed well with blinded routes (blinded route from
last trampoline to receiver) and also MPP (internal nodes can split
themselves with local information).

One added trade-off is that since the sender doesn't know the entire route,
they need to sort of overshoot w.r.t fees and CTLVs. This is something
we've known for a while, but until Trampoline is more widely rolled out, we
won't have a very good feel w.r.t how much extra senders will need to
allocate.

# Node Fee Optimization & Fee Rate Cards

Over the past few years, a common thread we've seen across successful
routing nodes is dynamic fee setting as a way to encourage/discourage
traffic. A routing nodes can utilize the set of fees of a channel to either
make it too expensive for other nodes to route through (it's already
depleted don't try unless you'll give be 10 mil sats, which no one would) or
very cheap, which'll incentivize flows in the other direction. If all nodes
are constantly sending out updates of this nature, then it can generate a
lot of traffic, and also sort of leak more balance information overtime
(which some nodes are already doing: using fees/max_htlc to communicate
available balances).

One attendee proposed allowing nodes to express a sort of fee gradient via a
static curve/bucket/function, instead of dynamically communicating what the
latest state of the fee+liquidity distribution looks like. A possible
manifestation could be a series of buckets, each of which with varying fee
rates. If your payment consumes 50% of channel balance, then you pay this
rate, otherwise if it's 5% you pay this rate, etc, etc. This might allow for
nodes to capture the same dynamics as they do with more dynamic fee updates,
but in a way that leaks less information and also consumes less gossip
bandwidth.

# The Return of Splicing

Splicing is one of those things that was discussed a long time ago, but was
never really fully implemented and rolled out. A few attendees have started
to take a closer look at the problem, building off of the interactive-tx
scheme that the dual-funding protocol extension uses. The main intricacy
discussed was if concurrent splices should be allowed or not, and if so, how
we would handle the various edge cases. As an example, if I propose a splice
to add more funds via my input, but that turns out to already be spent, then
the splicing transaction we created is invalid and can never be confirmed.
However if we allow _another_ splice to take place, and another one, and
another one, then ideally _one_ of them will confirm and serve as the new
anchor for the channel.

In a world of concurrent splices, the question of "what is my Lightning
balance" becomes even more murky. Wallet and implementations will likely
want to show the most pessimistic value, while also ensuring that the user
is able to effectively account for where all their funds and what they can
spend on/off chain.

# LN-URL + BOLT 12

LN-URL and BOLT 12 are both standardized ways that answer the question of:
how can I fetch an invoice from Bob? LN-URL differs from BOLT 12 in that it
uses the existing BOLT 11 invoice format, and uses an HTTP based protocol
for the negotiation process. BOLT 12 on the other hand is a suite of
protocol additions that includes (amongst other things) a new invoice format
(yay TLV!) and also a way to use onion messages to fetch an invoice _via_
the network.

Assuming blinded paths is widely rolled out, then the question of how
invoices are obtained becomes more important as blinded paths means that you
can't fit much in the traditional QR encoding. As a result, fetching
invoices on demand may become a more common place flow, with all its
trade-offs. There was a group discussion on how we could sort of unifying
everything either by allowing BOLT 12 to be used over LN-URL or the other
way around.

One proposal was to add a new query parameter to the normal LN-URL QR code
contents. This would mean that when a wallet goes to scan an LN-URL QR code,
if they know of the extra param, and what BOLT 12, they can just use the
enclosed offer to fetch the invoice.

An alternative proposal was to instead extract the BOLT 12 _invoice_ format
from the greater BOLT 12 "Offers" proposal. Assuming blinded paths is only
specified w.r.t BOLT 12 _invoices_, then this would mean an LN-URL extension
could be rolled out that allowed returning BOLT 12 invoice rather than BOLT
11 invoices. This would allow the ecosystem to slowly transition to a shared
invoice format, even if there may be fundamental disagreements w.r.t _how_
the invoices should be fetched in the first place.

It's worth noting that both of these proposals can be combined:

* If a wallet knows how to BOLT 12 Offers, they can take the enclosed
offer and run w/ it.

* If they don't know about Offers, but can send w/ the BOLT _invoice_
format, then they can fetch that and complete the payment.

This might be a nice middle ground as it would tend all
wallets/implementations to being able to decode and send w/ a BOLT 12
_invoice_, and leave the question of _how_ it should be fetched up to the
application/wallet/service. In the end, if paths never quite intersect, then
it's still possible to add route blinding to BOLT 11, with LN-URL sticking
with that invoice format to take advantage of the new privacy enhancements

[1]:
https://lists.linuxfoundation.org/pipermail/lightning-dev/2021-November/003336.html
[2]:
https://lists.linuxfoundation.org/pipermail/lightning-dev/2021-October/003278.html
[3]: https://github.com/lightning/bolts/pull/995
[4]: https://github.com/jonasnick/bips/blob/musig2/bip-musig2.mediawiki
[5]: https://datatracker.ietf.org/doc/html/rfc6979
[6]:
https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-February/003470.html
[7]:
https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-March/003526.html
[8]: https://arxiv.org/abs/2205.12737
[9]: https://bitcoinops.org/en/topics/minisketch/
[10]:
https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-April/003551.html
[11]: https://blog.bitmex.com/preventing-channel-jamming/
[12]:
https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-February/003498.html
[13]: https://github.com/lightning/bolts/pull/765
[14]:
https://lists.linuxfoundation.org/pipermail/lightning-dev/2021-October/003307.html
[15]: https://github.com/lightning/bolts/pull/910
[16]: https://github.com/lightning/bolts/pull/780

-- Laolu

_______________________________________________
Lightning-dev mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

[Lightning-dev] LN Summit 2022 Notes & Summary/Commentary

Reply via email to