Dear ZmnSCPxj, thank you very much for this mail and in particular the openess to tinker about a protocol change with respect to the transport layer of HTLCs / PTLCs! I fully agree that we should think about adopting our onion transport layer in a way that supports local payment splits and merges to resemble the flowing nature of payments more closely! While an update for that might be tricky I think right now is a perfect moment for such discussions as we kind of need a full upgrade anyway when going to PTLCs (which in fact might even be helpful with a local splittling / merging logic as secrets would be additive / linear) Before I elaborate let me state a few things and correct an error in your mail about the convex nature of the fee function and the problems of the min-cost flow solver.

I agree with you (and AJ) that there is a problem in the base fee that comes from overpaying when dissecting a flow into paths which share a channel. However this is not related to the convexity issue. In a recent mail [0] I conductd the calculation demonstrating why the current fee function f(x)=x*r+b is not even linear (every linear function should be convex). As described in the paper the problem is for the min cost flow solving algorithms if the cost-function (in that case the fee function) is neither linear nor convex. The issue that breaks convexity is really subtle here and comes when going from 0 flow to some flow. In that sense I apologize that the paper might be slightly misleading in its presentation. While convexity is often checked with the second derivative argument our function is in fact defined on integers and thus not even differentiable which is why the argument with the second derivative that we use to show the convexity of the negative log probabilities is not transferable. Let us better go with the wikipedia definition of convexity [1] that states: "In mathematics, a real-valued function is called convex if the line segment between any two points on the graph of the function lies above the graph between the two points." So how you would test convexity is by making sure that for any two values x_1 and x_2 the line connecting the points (x_1,f(x_1)) and (x_2,f(x_2)) lies above all other points (x,f(x)) for x \in {x_1, ...., x_2}. In our case because f(0)=0 and f(2)=2r+b The line connecting those two points is defined by l(x)=((2r+b)/2)*x = (r+b/2)*x which means that l(1)=r+b/2. however f(1) = r+ b which is larger than l(1) for any positive value of b. (Note that again with a zerobasefee b=0 we have f(1)=r=l(1) which would be sufficient) If you implement the capacity scaling algorithm for a min-cost flow solver you will see that the algorithm linearizes the convex costs all the time by switching to a unit cost in each delta-scaling phase (as described in section 9, 10.2 and 14.1 through 14.5 of the book [2] that we refer to in the paper and that Stefan already mentioned to you). This seems very ismilar to the thoughts you seemed to have in your mail (though even after reading them 3 times I can't verify how you turned the base_fee into a proportional term by switching to a unisized payment amount). You will also recognize that the base fee is not linearizable as a unit cost and will break the reduced cost optimality criterium in the delta phases of the algorithm. That being said in our mainnet tests we actually used 10k and 100k sats as a fixed min unit size for htlcs in flow computation but more about that in the already announced and soon to come mail about the mainnet tests. Long story short while the base fee yields indeed a problem for a flow to be dissected into paths the issues seems to be much more severe because of this non continious jump when going from f(0) to f(1). All that being said I am very delighted to see that you propose a protocol change towards "whole flow payments" (Please allow me to still call them payment flows as our paper title suggested). In my recent mail [0] I said I would want to write another mail about our mainnet test results and the the consequences for the protocol, node operators, users etc... One of the consequences goes fully along with the idea that you described here and thus I very much like / appreciate your idea / proposal! As the other mail was too long anyway let me quickly copy and past the text about the 5 consequences for the protocol that I saw from my draft and note that the second one goes fully along with your proposal: Consequences for the Protocol ======================== 1.) As we send larger amounts we confirmed what many people already knew: Stuck and hanging HTLCs are becoming more of a problem. The main reason that in our experiments we could not deliver the full amount was that we "lost" htlcs as they did neither arrive at the destination in time nor did they return as errors to the sender quickly enough before the mpp timeout. We very very much need a cancable payment mechanism [3]. I know taproot and PTLCs are coming but I think we should really emphasize on this aspect. I think the focus is of particular interest as I think it is bad that the current way of canceling an onion with the help of PTLCs can still lead to channel failure on the remote channel where the PTLC is stuck and still bind the local liquidity for a long time potentially preventing the sender to conduct the full payment and also binding the liquidity of the residual flow. 2.) When dissecting a payment flow into several paths it can and might happen that the amount `amt` sent through a remote channel will be actually coming from several inbound channels of the sending node of that channel or be going to several outbound channels of the receiving node of that channel. To me it seems reasonable to adopt the transport of HTLCs (or when redesigning with PTLCs) to have a transport mechanism that is more reflecting this reality of flows. In particular I envision a new lightning message called `update_update_htlc` to compliment `update_add_htlc`. This message would allow to increase or reduce the amount of an HTLC committed / offered to a channel so that folks would not have to have parallel htlcs in a channel that all belonged to the same payment. Maybe the channel factory ideas on eltoo channels that AJ Townes mentioned might be another direction to achieve this. Also I believe similar ideas have been discussed with link-AMP in the past. 3.) Especially when optimizing for success probabilities I believe there is a strong argument to solve min cost flows not for the amount to be paid but rather in a way that the expected value to be delivered matches the amount that one wishes to send. We can achieve this by computing the optimal flow for a larger amount and then send out redundant but cancable onions similar to the mechanism described in the Boomerang Paper [4] which I am happy to recommend to look at again. 4.) I want to recall that nodes probe the channels of their neighbors anyway while delivering the payment. If we had Friend of a Friend Balance sharing [5] together with a fee free rebalancing protocol we would reduce the uncertainty around the sender and the recipient (to be included in blinded path routing hints in the invoices or offers) drastically and improving the overall speed for larger payments to arrive because we first had more information about the neighborhood of the sender and the recipient. Additionally the prior probability changes from a uniform distribution to a normal distribution as shown in figure 13 in the probabilistic payment paper[6] which reduceses the necessary attempts again as shown in figure 14 of the same paper. 5.) If we either believe it is the best to drop the base fee and/or if we see that users stop using it we might want to think about depricating the base fee also on a protocol level (or if I recall correctly as has been done with the do not fragment bit / flag in IP) to reinterpret that the semantics of the base fee. As you can see when I was about to introduce this idea not because I was not worried about paying the basefee several times in disection (I guess the base fee will go away anyway) but just because for more practical reasons about HTLC space in commitment transactions (which I think is another excellent reason) to think more about a transport layer that respects flows as it also addresses the fixed cost issues (not in terms of signatures but in terms of potential on chain fees and transaction size). That being said I am not sure how easy it will be for us to create an onion transport to actually communicate the flow but I was arguing that maybe with a simple new Lightning message called `update_update_htlc` that would allow to update the amount of a given HTLC we could create something similar to local payment splitting and merging . In fact this could work in particular if an HTLC of size X arrives at a channel that has less liquidity than X. The forwarding node could back propagate the value that was passoible by lowering all upstream HTLCs to the liquidity the payer has in the downstream channel (though I haven't fully thought about all the consequences with respect to security for this yet) with kind regards Rene [0]: https://lists.linuxfoundation.org/pipermail/lightning-dev/2021-August/003203.html [1]: https://en.wikipedia.org/wiki/Convex_function [2]: Network Flows: Theory, Algorithms, and Applications Buch von James B. Orlin, Ravindra K. Ahuja und Thomas L. Magnanti [3]: https://lists.linuxfoundation.org/pipermail/lightning-dev/2019-June/002029.html [4]: https://arxiv.org/pdf/1910.01834.pdf [5]: https://github.com/lightningnetwork/lightning-rfc/pull/780 [6]: https://arxiv.org/abs/2103.08576 On Mon, Aug 30, 2021 at 12:58 PM ZmnSCPxj via Lightning-dev < lightning-dev@lists.linuxfoundation.org> wrote: > Good morning list, > > It seems to me that the cost function defined in Pickhard-Richter > can in fact include a base fee: > > -log(success_probability) + fudging_factor * amount * prop_fee + > fudging_factor * base_fee > > (Rant: why do mathists prefer single-symbol var names, in software > engineering there is a strong injunction **against** single-char > var names, with exceptions for simple numeric iteration loop vars, > I mean keeping track of the meaning of each single-symbol var name > takes up working memory space, which is fairly limited on a wetware > CPU, which is precisely why in software engineering there is an > injunction against single-char var names, srsly.) > > It seems to me that the above is "convex", as the `fudging_factor` > for a run will be constant, and the `base_fee` for the channel > would also be constant, and since it seems to me, naively, that > the paper defines "convex" as "the second derivative >= 0 for all > `amount`", and since the derivative of a constant is 0, the above > would still remain convex. > (I am not a mathist and this conjecture might be completely asinine.) > > So, it seems to me that the *real* reason for `#zerobasefee` is > not that the **mincostflow** algorithm cannot handle non-zero `base_fee`, > but rather, the **disect** phase afterwards cannot handle non-zero > `base_fee`. > > For example, suppose the minflowcost algorithm were instructed to > deliver 3,000msat from `S` to `D`, and returned the following flow: > > S -->3000--> A ->1000-> B > | | > | 1000 > | v > +-->2000-> C ->3000-> D > > In the "disect" phase afterwards, the above flow solution would have > to be split into two sub-payments, a 1000 sub-payment `S-A-B-C-D` > and a 2000 sub-payment `S-A-C-D`. > > However, this does mean that the base cost for `C->D` and `S->A` in > the above flow will pay *twice* the base cost than what the above > cost function would have computed, because in reality two independent > payments pass through those channels. > > Thus, the current Pickhardt-Richter scheme cannot work with non-zero > base fees if it were modified to consider fees, but not due to the > mincostflow algorithm, rather because due to the disect algorithm. > > ### Converting Non-Zero Base Fees To Proportional Fees > > An alternative solution would be to optimize for a *maximum* fee > rather than optimize for the *exact* fee. > > We can do this by virtually splitting up the entire payment into > smaller bunches of value, and asking mincostflow to solve individual > bunches rather than individual msats. > > For example, we can decide that the smallest practical HTLC would > be 1000 msat. > Let us call this the "payment unit", and the mincostflow algo > solves for paying in these payment units instead of 1msat units. > (Actual implementations would need to have some heuristic or > reasoned rule-of-thumb assumption on what the smallest practical > HTLC would be.) > > In our example, suppose we need to send 2001msat from `S` to `D`, > with the payment unit being 1000 msat. > Then we would actually have the mincostflow algorithm work to > deliver 3 payment units (3 * 1000msat) from `S` to `D`. > > Let us suppose that the mincostflow algorithm returns the following > flow: > > S -->3-----> A ->1----> B > | | > | 1 | > | v > +-->2----> C ->3----> D > > Actually, what the mincostflow algorithm uses as a cost function > would be: > > -log(success_probability) + fudging_factor * amount * prop_fee * > payment_unit + fudging_factor * base_fee * amount > > == -log(success_probability) + fudging_factor * amount * (prop_fee * > payment_unit + base_fee) > > where: amount is in units of 1000msat, our smallest practical HTLC. > payment_unit is the unit, i.e. 1000msat. > > What the above means is that the mincostflow algorithm *allocates* > `3 * base_fee` for `C->D`, since the `amount` flowing through > `C->D` would be 3. > However, when we pass the above flow to the disect algorithm, it > would actually only split this into 2 sub-payments, so the > actual payment plan would only pay `2 * base_fee` for the > `C->D` leg on both sub-payments. > > In short, this effectively converts the base fee to a proportional > fee, removing the zerobasfee requirement imposed by the disect > algorithm. > > That is, the cost computed by the mincostflow algorithm is really a > maximum cost budget that the subsequent disect algorithm could later > spend. > > In effect, this converts the base fee to a proportional fee. > > This may be acceptable in practice. > This approximation has a bias against non-zerobasefee --- it would > treat those channels as being far more expensive than they actually > would end up being in an *actual* payment attempt --- but at least > does not *require* zerobasefee. > It would be able to still use non-zerobasefee channels if those are > absolutely required to reach the destination. > > This should at least help create a practical payment algorithm that > handles current LN with nonzerobasefee, and provide an impetus > towards making zerobasefee a reality. > > Note that this approximation would be non-optimal --- there may > be solutions that provide lower costs than the output of this > approximation would provide. > > It may be practical to have the payment unit be adjustable by a > higher-level (but still automated) process. > For example, it may be considered that having more than 500 splits > would be heuristically determined to be impractical, in which > case the payment unit could be the actual amount divided by 500 > rounded up. > Then if that results in extreme overestimation of base costs > (i.e. the subsequent disect stage results in far fewer than 500 > splits) the payment unit could be adjusted upwards. > Similarly, if that results in a payment plan that has high > base fees, the payment unit could be adjusted downwrds. > > -- > > Similarly, having particular split amounts is helpful to reduce > the ability of intermediate nodes snooping on payment amounts, > so having a particular fixed payment unit may also be useful > nevertheless, even if this approximation is otherwise undesirable. > > That is, in the above example, we just send 3 sub-payments > instead of the minimum 2, so that every sub-payment is always > uniformly 1000msat. > This leaks the least amount of information to intermediate nodes, > especially after payment decorrelation is widely deployed. > (this is the same principle for why a uniform unit amount is > needed in practical CoinJoin implementations; it increases the > anonymity set of individual sub-payments.) > > This would be helped greatly by payment decorrelation; > intermediate nodes would remain uncertain that multiple > HTLCs with the same amount and the same in-channel and > out-channel are from the same overall payment or not. > Assuming many software uses the same payment unit consistently, > less information can be extracted by surveillors, and the > increased cost may be considered justifiable in paying for > additional privacy. > > -- > > ### Whole-flow Payments > > Suppose we were to instead modify the LN protocol such that, > instead of MPP having individual sub-payments that split at the > source and join at the destination, we allow the source to > instruct arbitrary intermediate nodes to join and/or split > payments. > > That is, suppose LN allowed the source to indicate to arbitrary > forwarding nodes to split payments and join them. > (This would require nearly all nodes to upgrade, which may be > impractical, but let us consider its implications first without > committing to actually doing this.) > > A node that is a join node would be instructed to wait for all > incoming HTLCs to arrive before they create any outgoing > HTLCs. > A node that is a split node would make more than one outgoing > HTLCs. > Forwarding nodes can be both join and split nodes. > > This would allow the output of the mincostflow algorithm to be > used directly, without the problematic disect stage. > Nodes that have more than one in-flow would be join nodes, > while nodes with more than one out-flow would be split nodes. > > Returning to the example: > > S -->3000--> A ->1000-> B > | | > | 1000 > | v > +-->2000-> C ->3000-> D > > In the above, `S` sends a *single* payment out on channel > `S->A`, and instructs `A` to split the payment. > Then `S` also instructs `C` to join two payments, resulting > in a single HTLC out on `C->D` if `C` receives on both > `A->C` and `B->C`. > > The effect is that, for a single payment plan (presented as a > flow, outputted by a mincostflow algorithm) for each channel > with nonzero flow, there would only be one HTLC that gets paid > for using a single `base_fee`. > > Then, the base fee computed by the cost function would be exact > and there would be no underestimation of the base fee for > non-zerobasefee channels. > > Let us then consider the practicality of this scheme. > > If a hop were to fail to deliver, then some join nodes would > be left hanging, waiting for an incoming HTLC that will not > arrive. > > The failure would have to be handled by a split node, when it > receives a `update_fail_htlc`. > That split node cannot propagate the failure backwards to the > source, since it would still have other outgoing HTLCs in > play, which it cannot safely ignore; it can only propagate > the failure backwards if all the other outgoing HTLCs also > fail. > > To support this, a split node could give a request-to-fail > signal to its other outgoing HTLCs. > Then, any other join nodes still waiting for an incoming > HTLC (and would not yet have created any outgoing HTLCs, > or for the destination, would not have claimed any incoming > HTLCs) would fail all its incoming HTLCs. > Otherwise, an intermediate node receiving a request-to-fail > signal from any incoming HTLCs would simply propagate > this request-to-fail outwards to all its outgoing HTLCs. > > This implies that for this scheme, if *any* hop fails, the > entire payment fails and has to be restarted "from the top". > > This is in contrast with the current scheme of splitting > only at the source and joining only at the destination. > In the current LN, if *any* hop of an MPP fails, only the > sub-payment(s) going through that hop will need to be > retried. > Other sub-payment(s) can keep being "in play", and the > source, having all the information needed, only needs to > recompute the failing outgoing payments. > > On the other hand, the same information (particular hops are > failing) would be propagated back to the source, and the > source having to recompute the entire flow rather than just > some subset of the payments is not much worse computationally > (work is still dominated by `O((m*m+m*n)*log n)` where `m` is > the channels of the *entire network* and `n` is the nodes > of the *entire network*, amount is just a piddling little > `O(log(u))` term). > It also removes the need to virtually reduce the capacity of > intermediate hops while other sub-payments are in play > (however, a practical implementation would need to consider > the possibility of multiple *simultaneous* outgoing payments > anyway and would still retain this reduction of capacity in > case a new parallel outgoing payment is triggered while one > is still ongoing). > > This leads us to some fairly interesting choices: > > * [ ] Whole-flow payment is a brilliant idea and we should > totally do it. `#deletethezerobasefeedebate` > * [ ] Whole-flow payment is a horrible idea and we should > totally not do it. > * [ ] As we can see, the LN community needs to get together > for better payment success, and supporting whole-flow > payments is worse than setting `basefee=0`. `#zerobasefee` > * [ ] As we can see, the current LN provides a vital service > (partial failure of MPP) and intermediate nodes need to be > compensated for this service. `#nonzerobasefee` > > Regards, > ZmnSCPxj > _______________________________________________ > Lightning-dev mailing list > Lightning-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev > -- https://www.rene-pickhardt.de

_______________________________________________ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev