Hi y'all, Last week nearly 30 (!) Lightning developers and researchers gathered in Oakland, California for three day to discuss a number of matters related to the current state and evolution of the protocol. This time around, we had much better representation for all the major Lightning Node implementations compared to the last LN Dev Summit (Zurich, Oct 2021).
Similar to the prior LN Dev Summit, notes were kept throughout the day that attempted on a best effort basis to capture the relevant discussions, decisions, and new relevant research or follow up areas to circle back on. Last time around, I sent out an email that summarized some key takeaways (from my PoV) of the last multi-day dev summit [1]. What follows in this email is a similar summary/recap of the three day summit. Just like last time: if you attended and felt I missed out on a key point, or inadvertently misrepresented a statement/idea, please feel free to reply, correcting or adding additional detail. The meeting notes in full can be found here: https://docs.google.com/document/d/1KHocBjlvg-XOFH5oG_HwWdvNBIvQgxwAok3ZQ6bnCW0/edit?usp=sharing # Simple Taproot Channels During the last summit, Taproot was a major discussion topic as though the soft fork had been deployed, we we're all still watching the 🟩 's stack up on the road to ultimately activation. Fast forward several months later and Taproot has now been fully activated, with ecosystem starting to progressively deploy more and more advanced systems/applications that take advantage of the new features. One key deployment model that came out of the last LN Dev summit was the concept of an iterative roadmap that progressively revamped the system to use more taprooty features, instead of a "big bang" approach that would attempt to package up as many things as possible into one larger update. At a high level the iterative roadmap proposed that we unroll an existing larger proposal [2] into more bite sized pieces that can be incrementally reviewed, implemented, and ultimately deployed (see my post on the LN Dev Summit 2021 for more details). ## Extension BOLTs Riiight before we started on the first day, I wrote up a minimal proposal that attempted to tackle the first two items of the Taproot iterative deployment schedule (musig2 funding outputs and simple tapscript mapping) [3]. I called the proposal "Simple Taproot Channels" as it set out to do a mechanical mapping of the current commitment and script structure to a more taprooty domain. Rather than edit 4 or 5 different BOLTs with a series of "if this feature bit applies" nested clauses, I instead opted to create a new standalone "extension bolt" that defines _new_ behavior on top of the existing BOLTs, referring to the BOLTs when necessary. The style of the document was inspired by the "proposals" proposal (very meta), which was popularized by cdecker and adopted by t-bast with his documents on Trampoline and Blinded Paths. If the concept catches on, extension BOLTs provide us with a new way to extend the spec: rather than insert everything in-line, we could instead create new standalone documents for larger features. Having a single self contained document makes the proposal easier to review, and also gives the author more room to provide any background knowledge, primaries, and also rationale. Overtime, as the new extensions become widespread (eg: taproot is the default channel type), we can fold in the extensions back to the main set of "core" BOLTs (or make new ones as relevant). Smaller changes to the spec like deprecating an old field or tightening up some language will likely still follow the old approach of mutating the existing BOLTs, but larger overhauls like the planned PTLC update may find the extension BOLTs to be a better tool. ## Tapscript, Musig2, and Lightning As mentioned above the Simple Taproot Channels proposal does two main things: 1. Move the existing 2-of-2 p2wsh segwit v0 funding output to a _single key_ p2tr output, with the single key actually being an aggregated musig2 key. 2. Map all our existing scripts to the tapscript domain, using the internal key (keyspend path) for things like revocations, which an potentially allow nodes to store less state for HTLCs. Of the two components #1 is by far the trickiest. Musig2 is a very elegant protocol (not to mention the spec which y'all should totally check out) but as the signatures aren't deterministic (like RFC 6979 [5]), both signers need to "protect themselves at all times" to ensure they don't ever re-use nonces, which can lead to a private key leak (!!). Rather than try to create some sort of psuedo-deterministic nonces scheme (which maaybe works until the Blockstream Research team squints vaguely in its direction), I opted to just make all nonces 100% ephemeral and tied to the lifetime of a connection. Musig2 defines something called a public nonces, which is actually two individual 33-byte nonces. This value needs to be exchanged before signing can begin (but can be sent before sides know they're aggregated keys). One important thing to note is that given that the channels today have _asymmetric_ state, we actually need a _pair_ of public nonces: one that I'll use to sign my commitment, and one I'll use to sign yours. Lightning channels w/ symmetric state like eltoo can get by w/ only exchange a single set of nonces, as there's only one message per state. Nonce exchange takes place in a few places: * During initial funding: I send my public nonce in the open_channel message, you send yours in the accept_channel message. After this exchange we can both generate signatures for the refund commitment transactions. * After the channel is "ready" we send another set of nonces, so we can sign the next state. This is similar to the existing revocation key exchange: I need your next nonce/key before I can sign a new state. * Upon channel re-establishment a _new_ set of nonces is sent, as they're 100% ephemeral. The current draft also requires that if you were re-transmitting a sig, then you use the _new_ nonces to sign again, as it's possible you went to retransmit but left off an expired/trimmed HLTC (could lead to nonce re-use and also needing to remember nonces). * Each time I revoke my channel, I send to you a single nonce, my "local nonce" (naming needs some work here), which lets you sign for a new state. * Each time I send a new sig, I also send you another nonce, my "remote" nonce", which * When I send a shutdown (co-op close) I send a single public nonce so we can sign the next co-opc close offer. * When I send a closing_signed I send another nonce so once you send your offer, we sign another set. The final flows aren't 100% yet finalized, as we'll need some implementations drafted to make sure the nonce handling and script mapping works out properly. ### Lightning Channels & Recursive Musig2 One other cool topic that came up is the concept of leveraging recursive musig2 (so musig2 within musig2) to make channels even _more_ multi-sigy. The benefit here is that Bob & Carol can each have their individual keys (which might actually be aggregated keys themselves) and make a channel w/ Alice, who only knows of them as Barol, and doesn't know there're actually another pair of keys at play. This is _really_ cool as it allows node operators, wallets, and lightning platforms to experiment with various key/signing trees that may add more security, redundancy, or flexibility. When this first came up, someone brought up the fact that while the scheme is "known" the initial paper as they weren't sure how to actually write a proof for it. During the session, someone emailed one of the musig2 authors asking for more details, and if it's safe to implement and roll out. Thankfully they quickly replied and explained that the proof recursive musig (pls someone correct me again here if I'm wrong) wasn't left out due to impossibility, but that a proof in the existing Random Oracle Model (which was used to derive a bound for the number of nonces needed) would lead to a blow up in the number of nonces required. Attempting to write the proof in some other model would likely lead to better results (proved w/ two nonces as base musig2), but would end up being pretty complicated, so hard to read and even review for correctness. Assuming everything checks out, then a useful mental model explained by the musig2 BIP author is a sort of tree structure. Assuming I'm a signer, and we assemble the other signer as a sibling leaf in a binary tree, then I just need to wait for the sibling nonce/key, before I can aggregate that into the final value. So if there're 3 signers, I wait for the regular public nonce, but the other signers sum their respective nonces into a single nonce, then send that to me. A similar operation is carried out for key aggregation, with the rest of the protocol being mostly the same. Ultimately, even if wallets/nodes aren't ready to roll something like this out today, we at least want to make sure the proposed flow is compatible with Simple Taproot Channels, and ideally we'd have a toy implementation to verify out understanding and show it's possible/sound. I volunteered to hack up a simple recursive musig2 demo, as there doesn't seem to be any code in the wild that implements it. ## Lightning Gossip # Gossip V2: Now Or Later? Another big topic related to Taproot was the question of how we should update the gossip network: the gossip protocol today has all channels validated by node, which requires that the nodes understand how to reconstruct the funding output based on the set of advertised keys. The protocol today assumes a segwit v0 p2wsh multi-sig is used. Assuming we had everything implemented today, a node wouldn't be able to advertise its new taproot channels to the rest of the public graph as they wouldn't understand how to validate it. This presents a new opportunity: we already need to rework gossip for taproot, so should we go ahead and re-design the entire thing with an eye for better privacy and future extensibility? A proposal for the "re-design the entire thing" was floated in the past by Rusty [6]. It does away with the strict coupling of channels to channel announcements, and instead moves them to the _node_ level. Each node would then advertise the set of "outputs" they have control of, which would then be mapped to the total capacity of a node, without requiring that these outputs self identify themselves on-chain as Lightning Channels. This also opens up the door to different, potentially more privacy preserving proofs-of-channel-ownership (something something zkp). On the other hand, we could just follow the path of Simple Taproot Channels and map musig2+schnorr onto the existing gossip network. This is less changes in total, with the main benefit being the ability to only send 1 sig (aggregated musig2 sig of keys) instead of 4 individual sigs. I made a very lofty proposal in this direction here [7]. Ultimately we decided to take the "just musig2 aspects" from gossip v1.5 (not the real name), and the "let's refresh all the messages w/ TLV goodness" from the gossip v2 proposal. This gives us a smaller package to implement, and lets us potentially rejigger the messages to be more extensible and remove cruft like the node color that almost nothing uses, but we all validate/store. The follow up work in this area is a more concrete proposal that updates the relevant gossip messages to be taproot aware and TLV'd and also update the set of requirements w.r.t _how_ to validate the channels in the first place (so given two keys verify that applying the keyagg method of musig2 lead to what' in the funding output). Gossip v2 will likely happen "eventually", but the rather large design space needs to be explored a bit more so we can properly analyze exactly what privacy and extensibility properties we'll get out of it. # Applying Mini Sketch to LN Gossip One issue we have today, is that other than the initial scid query mechanism added to the protocol, there isn't a great way to ensure you have all the latest updates your peer has. These days, many nodes pretty aggressively rate limit other nodes, so you might even have trouble sending out your update in the first place. A recent paper (that I haven't actually fully read yet) [8] analyzes the gossip network today to work out things like: exactly how long it takes things to propagate, total bandwidth usage, etc. Minisketch [9] (the grandchild of IBLTs ;)), is an efficient set reconciliation protocol that was designed for Bitcoin p2p mempool syncing, but can be applied to other protocols. An attendee has been working on brushing off some older work to try to see how we could apply it to the LN protocol to give nodes a more bandwidth efficient way to sync channel updates, and also achieve better update propagation. This supplements some existing investigative work done by Alex Meyers [10], with more concrete designs w.r.t: what goes into the sketch, and the various size parameters that need to be chosen. # Channel Jamming An attendee gave a talk on the various proposed solutions to channel jamming, evaluating them on several axis including: punishment/monetary, local vs global reputation, feasibility of mechanism design, UX implications, and implementation complexity. The presenter didn't present a new concrete proposal, but instead went through the various trade-offs, ultimately concluding that they factor monetary penalties wherein the funds are distributed across the route, rather than being provably burnt to miners. However they alluded to some future upcoming work that attempts a more rigorous analysis of the proposed solutions, their tradeoffs, and potential ways we can parametrize solutions to be more effective (how much should they pay, etc). For those looking to brush up on the latest state of research/mitigations in this area, I recommend this blog post by Bitmex research [11]. # Onion Messages & DoS The topic of DoS concerns related to onion messages (in isolation, so not necessarily related to things like bolt12 that take advantage of them came up. During a white boarding session some argued that DoS isn't actually much of an issue, as nodes can leverage "back propagation congestion control" to inform the source (who may not actually be the sender) that they'll start to drop or limit their packets, with each node doing this iteratively until the actual source of the spam has been clamped. A few lofty designs were thrown around, but more work needs to be done to concretely specify something so it can be properly analyzed. On the other side of the spectrum, rather than attempt to rate limit at the node level (which each node having their own policy), nodes could opt instead to forward _anything_ as long as the sender pays them enough. I proposed a lofty approach that combined AMP and Onion Messages earlier this year [12]. At a high level I make an AMP payment, which pushes extra coins to all nodes on a route, and also drops off a special identifier to them. When I send an onion message I include this identifier, with each node performing their own account w.r.t the amount of bandwidth an ID has remaining. Ultimately a few implementations are pretty close to deploying their implementation of onion messages, so no matter the intended use case, it would be good to have code deployed along side to either rate limit or price resource consumption accordingly. Otherwise, we might end up in a scenario where DoS concerns were brushed aside, but end up being a huge issue later. # Blinded Paths, QR Codes & Invoices Blinded paths [13] is a new-er proposal to solve the "last mile" privacy issue when receiving payments on LN. Today invoices to unadvertised channels contain a set of hop hints, which are anchored at public nodes in the graph, and also leak the scid of the unadvertised channel (points on-chain to the channel receiving payments). A solution for the on-chain leak, SCID channel aliases [15] are in the process of being widely rolled out. Channel aliases instead use a random value in the invoice, allowing receiving nodes to break that on-chain link and even rotate out the value periodically. With the on-chain leak addressed, it's still the case that you give away your "position" in the network, since as a sender I know that you're connected to node N with a private channel. Blinded paths address this node-level last mile privacy leak by replacing hop hints with a new cryptographically blinded path. At a high level, the receiver can construct a "hop hint" of length greater than 1, gather the public keys of each of the nodes, then blinded them such that: the sender can use them for path finding, but doesn't actually now exactly _which_ nodes they actually are. There're two type of blinded paths: those in onion messages and those used for actual payments. The latter variant was only formalized earlier this year, as before people were mainly interested in using them to fetch BOLT 12 invoice via onion messages. One issue that pops up when attempting to use blinded paths for normal payments is: the size of the resulting invoice. As blinded paths are actually fragments of publicly known paths, as a receiver, you want to stuff as many of them into the invoice as possible, since they MUST be taken in order to route towards you. Invoices are typically communicated via QR codes, which have a hard limit w.r.t the amount of information that can be packed in. On the other hand for invoice fetching, all that matters is that a path exists, so you can get by with stuffing less of then in a QR code. As a result, blinded paths aren't necessarily compatible with the widely deployed BOLT 11 based QR codes. Instead a way to fetch invoice on demand is required. Both BOLT-12 and LN-URL provide standardized ways for nodes to fetch invoices, though their transport/signalling medium of choice differs. Blinded routes are technically compatible with BOLT 11 invoices, but may be hampered by the fact that you can only include so many routes. Another consideration is that unlike hop hints, blinded paths require more maintain once, as since they traverse public route, policy changes like a fee update may invalidate an entire set set of routes. One proposed solution is that forwarding nodes should observe their older policy for a period of time (so a grace period), and also that blinded paths should have an explicit expiry (similar to the existing invoice expiry). One other implication is that the set of routes the receiver includes matters more: if they don't send enough or select them poorly, the sender may never be able to reach them even though a path exists in theory. More hands on experience is needed so the spec authors can better guide implementations and wallets w.r.t best practices. # Friend-of-a-friend Balance Sharing & Probing A presentation was given on friend-of-a-friend balance sharing [16]. The high level idea is that if we share _some_ information within a local radius, then this gives the sender more information to choose a path that's potentially more reliable. The tradeoff here ofc is that nodes will be giving away more information that can potentially be used to ascertain payment flows. In an attempt to minimize the amount of information shared, the presenter proposed that just 2 bits of information be shared. Some initial simulations showed that sharing local information actually performed better than sharing global information (?). Some were puzzled w.r.t how that's possible, but assuming the slides+methods are published others can dig further into the model/parameter used to signal the inclusion. Arguably, information like this is already available via probing, so one line of thinking is something like: "why not just share _some_ of it" that may actually lead to less internal failures? This is related to a sort of tension between probing as a tool to increase payment reliability and also as a tool to degrade privacy in the network. On the other hand, others argued that probing provides natural cover traffic, since they actually _are_ payments, though they may not be intended to succeed. On the topic of channel probing, a sort of makeshift protocol was devised to make it harder in practice, sacrificing too much on the axis of payment reliability. At a high level it proposes that: * nodes more diligently set both their max_htlc amount, as well as the max_htlc_value_in_flight amount * a 50ms (or select other value) timer should be used when sending out commitment signatures, independent of HTLC arrival * nodes leverage the max_htlc value to set a false ceiling on the max in flight parameter * for each HTLC sent/forwarded, select 2 other channels at random and reduce the "fake" in-flight ceiling for a period of time Some more details still need to be worked out, but some felt that this would kick start more research into this area, and also make balance mapping _slightly_ more difficult. From afar, it may be the case that achieving balance privacy while also achieving acceptable levels of payment reliability might be at odds with each other. # Eltoo & ANYPREVOUT One of the attendees is currently working on both fully implementing eltoo, as well as specifying the exact channel funding+update interaction were it to be rolled out align side the existing penalty based channels in the protocol. As this version of eltoo is based on Taproot, we were able to compare notes a bit to find the overlapping set of changes (nonce handling, etc), which permits cross review of the proposals. This type of work is cool, as only by fully implementing something end to end can you reaaally work out all the edge cases and nuances. ANYPREVOUT as hasn't changed significantly as of late. An attendee shared plans to create a sort of mega all-future-feasible-soft-forks fork of bitcoind, that would package up various unmerged (from bitcoind's) proposal soft fork packages into an easy to run+install binary/project attached to a signet. The hop is that by giving developers an easy way to interact with proposed soft fork proposals (vs debasing some ancient pull request), wider participation in testing/implementation/review can be facilitated. # Trampoline Routing There was a presentation on Trampoline routing explaining the motivation, history, and current state of the proposal. The two main cases we've narrowed down on are: 1. A mobile user doesn't necessarily want to sync the _entire_ graph, so they can use trampoline to maintain a subset and still be able to send payments. 2. A mobile user wants to be able to instate a payment, go offline, and return at a later time to learn about the final state of the payment. Use case #2 seems to be the most promising when combined with other proposals for holding HTLCs at an origin node (call it an "LSP") [13]. Combined together, this would allow a mobile node to send a payment, then go offline, with the LSP being able to retry the payment either continuously or only when it knows the receiver is online to accept the payment. This may potentially dramatically improve the UX for LN on mobile, as things suddenly become a lot more asynchronous: I do something go offline, and the LSP node can fulfil the payment in the background, then wait for me to come online to settle the final. hop. Trampoline can also be composed well with blinded routes (blinded route from last trampoline to receiver) and also MPP (internal nodes can split themselves with local information). One added trade-off is that since the sender doesn't know the entire route, they need to sort of overshoot w.r.t fees and CTLVs. This is something we've known for a while, but until Trampoline is more widely rolled out, we won't have a very good feel w.r.t how much extra senders will need to allocate. # Node Fee Optimization & Fee Rate Cards Over the past few years, a common thread we've seen across successful routing nodes is dynamic fee setting as a way to encourage/discourage traffic. A routing nodes can utilize the set of fees of a channel to either make it too expensive for other nodes to route through (it's already depleted don't try unless you'll give be 10 mil sats, which no one would) or very cheap, which'll incentivize flows in the other direction. If all nodes are constantly sending out updates of this nature, then it can generate a lot of traffic, and also sort of leak more balance information overtime (which some nodes are already doing: using fees/max_htlc to communicate available balances). One attendee proposed allowing nodes to express a sort of fee gradient via a static curve/bucket/function, instead of dynamically communicating what the latest state of the fee+liquidity distribution looks like. A possible manifestation could be a series of buckets, each of which with varying fee rates. If your payment consumes 50% of channel balance, then you pay this rate, otherwise if it's 5% you pay this rate, etc, etc. This might allow for nodes to capture the same dynamics as they do with more dynamic fee updates, but in a way that leaks less information and also consumes less gossip bandwidth. # The Return of Splicing Splicing is one of those things that was discussed a long time ago, but was never really fully implemented and rolled out. A few attendees have started to take a closer look at the problem, building off of the interactive-tx scheme that the dual-funding protocol extension uses. The main intricacy discussed was if concurrent splices should be allowed or not, and if so, how we would handle the various edge cases. As an example, if I propose a splice to add more funds via my input, but that turns out to already be spent, then the splicing transaction we created is invalid and can never be confirmed. However if we allow _another_ splice to take place, and another one, and another one, then ideally _one_ of them will confirm and serve as the new anchor for the channel. In a world of concurrent splices, the question of "what is my Lightning balance" becomes even more murky. Wallet and implementations will likely want to show the most pessimistic value, while also ensuring that the user is able to effectively account for where all their funds and what they can spend on/off chain. # LN-URL + BOLT 12 LN-URL and BOLT 12 are both standardized ways that answer the question of: how can I fetch an invoice from Bob? LN-URL differs from BOLT 12 in that it uses the existing BOLT 11 invoice format, and uses an HTTP based protocol for the negotiation process. BOLT 12 on the other hand is a suite of protocol additions that includes (amongst other things) a new invoice format (yay TLV!) and also a way to use onion messages to fetch an invoice _via_ the network. Assuming blinded paths is widely rolled out, then the question of how invoices are obtained becomes more important as blinded paths means that you can't fit much in the traditional QR encoding. As a result, fetching invoices on demand may become a more common place flow, with all its trade-offs. There was a group discussion on how we could sort of unifying everything either by allowing BOLT 12 to be used over LN-URL or the other way around. One proposal was to add a new query parameter to the normal LN-URL QR code contents. This would mean that when a wallet goes to scan an LN-URL QR code, if they know of the extra param, and what BOLT 12, they can just use the enclosed offer to fetch the invoice. An alternative proposal was to instead extract the BOLT 12 _invoice_ format from the greater BOLT 12 "Offers" proposal. Assuming blinded paths is only specified w.r.t BOLT 12 _invoices_, then this would mean an LN-URL extension could be rolled out that allowed returning BOLT 12 invoice rather than BOLT 11 invoices. This would allow the ecosystem to slowly transition to a shared invoice format, even if there may be fundamental disagreements w.r.t _how_ the invoices should be fetched in the first place. It's worth noting that both of these proposals can be combined: * If a wallet knows how to BOLT 12 Offers, they can take the enclosed offer and run w/ it. * If they don't know about Offers, but can send w/ the BOLT _invoice_ format, then they can fetch that and complete the payment. This might be a nice middle ground as it would tend all wallets/implementations to being able to decode and send w/ a BOLT 12 _invoice_, and leave the question of _how_ it should be fetched up to the application/wallet/service. In the end, if paths never quite intersect, then it's still possible to add route blinding to BOLT 11, with LN-URL sticking with that invoice format to take advantage of the new privacy enhancements [1]: https://lists.linuxfoundation.org/pipermail/lightning-dev/2021-November/003336.html [2]: https://lists.linuxfoundation.org/pipermail/lightning-dev/2021-October/003278.html [3]: https://github.com/lightning/bolts/pull/995 [4]: https://github.com/jonasnick/bips/blob/musig2/bip-musig2.mediawiki [5]: https://datatracker.ietf.org/doc/html/rfc6979 [6]: https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-February/003470.html [7]: https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-March/003526.html [8]: https://arxiv.org/abs/2205.12737 [9]: https://bitcoinops.org/en/topics/minisketch/ [10]: https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-April/003551.html [11]: https://blog.bitmex.com/preventing-channel-jamming/ [12]: https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-February/003498.html [13]: https://github.com/lightning/bolts/pull/765 [14]: https://lists.linuxfoundation.org/pipermail/lightning-dev/2021-October/003307.html [15]: https://github.com/lightning/bolts/pull/910 [16]: https://github.com/lightning/bolts/pull/780 -- Laolu
_______________________________________________ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev