# Re: [Lightning-dev] Handling nonzerobasefee when using Pickhard-Richter algo variants

Dear ZmnSCPxj,

thank you very much for this mail and in particular the openess to tinker
about a protocol change with respect to the transport layer of HTLCs /
PTLCs! I fully agree that we should think about adopting our onion
transport layer in a way that supports local payment splits and merges to
resemble the flowing nature of payments more closely! While an update for
that might be tricky I think right now is a perfect moment for such
discussions as we kind of need a full upgrade anyway when going to PTLCs
(which in fact might even be helpful with a local splittling / merging
logic as secrets would be additive / linear) Before I elaborate let me
state a few things and correct an error in your mail about the convex
nature of the fee function and the problems of the min-cost flow solver.

I agree with you (and AJ) that there is a problem in the base fee that
comes from overpaying when dissecting a flow into paths which share a
channel. However this is not related to the convexity issue. In a recent
mail [0] I conductd the calculation demonstrating why the current fee
function f(x)=x*r+b is not even linear (every linear function should be
convex). As described in the paper the problem is for the min cost flow
solving algorithms if the cost-function (in that case the fee function) is
neither linear nor convex.

The issue that breaks convexity is really subtle here and comes when going
from 0 flow to some flow. In that sense I apologize that the paper might be
slightly misleading in its presentation. While convexity is often checked
with the second derivative argument our function is in fact defined on
integers and thus not even differentiable which is why the argument with
the second derivative that we use to show the convexity of the negative log
probabilities is not transferable.

Let us better go with the wikipedia definition of convexity [1] that
states:
"In mathematics, a real-valued function is called convex if the line
segment between any two points on the graph of the function lies above the
graph between the two points."

So how you would test convexity is by making sure that for any two values
x_1 and x_2 the line connecting the points (x_1,f(x_1)) and (x_2,f(x_2))
lies above all other points (x,f(x)) for x  \in {x_1, ...., x_2}. In our
case because f(0)=0 and f(2)=2r+b
The line connecting those two points is defined by l(x)=((2r+b)/2)*x =
(r+b/2)*x which means that l(1)=r+b/2.
however f(1) = r+ b which is larger than l(1) for any positive value of b.
(Note that again with a zerobasefee b=0 we have f(1)=r=l(1) which would be
sufficient)

If you implement the capacity scaling algorithm for a min-cost flow solver
you will see that the algorithm linearizes the convex costs all the time by
switching to a unit cost in each delta-scaling phase (as described in
section 9, 10.2 and 14.1 through 14.5 of the book [2] that we refer to in
the paper and that Stefan already mentioned to you). This seems very
ismilar to the thoughts you seemed to have in your mail (though even after
reading them 3 times I can't verify how you turned the base_fee into a
proportional term by switching to a unisized payment amount). You will also
recognize that the base fee is not linearizable as a unit cost and will
break the reduced cost optimality criterium in the delta phases of the
algorithm. That being said in our mainnet tests we actually used 10k and
100k sats as a fixed min unit size for htlcs in flow computation but more
about that in the already announced and soon to come mail about the mainnet
tests.

Long story short while the base fee yields indeed a problem for a flow to
be dissected into paths the issues seems to be much more severe because of
this non continious jump when going from f(0) to f(1).

All that being said I am very delighted to see that you propose a protocol
change towards "whole flow payments" (Please allow me to still call them
payment flows as our paper title suggested). In my recent mail [0] I said I
would want to write another mail about  our mainnet test results and the
the consequences for the protocol, node operators, users etc... One of the
consequences goes fully along with the idea that you described here and
thus I very much like / appreciate your idea / proposal! As the other mail
was too long anyway let me quickly copy and past the text about the 5
consequences for the protocol that I saw from my draft and note that the
second one goes fully along with your proposal:

Consequences for the Protocol
========================

1.) As we send larger amounts we confirmed what many people already knew:
Stuck and hanging HTLCs are becoming more of a problem. The main reason
that in our experiments we could not deliver the full amount was that we
"lost" htlcs as they did neither arrive at the destination in time nor did
they return as errors to the sender quickly enough before the mpp timeout.
We very very much need a cancable payment mechanism [3]. I know taproot and
PTLCs are coming but I think we should really emphasize on this aspect. I
think the focus is of particular interest as I think it is bad that the
current way of canceling an onion with the help of PTLCs can still lead to
channel failure on the remote channel where the PTLC is stuck and still
bind the local liquidity for a long time potentially preventing the sender
to conduct the full payment and also binding the liquidity of the residual
flow.

2.)  When dissecting a payment flow into several paths it can and might
happen that the amount amt sent through a remote channel will be actually
coming from several inbound channels of the sending node of that channel or
be going to several outbound channels of the receiving node of that
channel. To me it seems reasonable to adopt the transport of HTLCs (or when
redesigning with PTLCs) to have a transport mechanism that is more
reflecting this reality of flows. In particular I envision a new lightning
message called update_update_htlc to compliment update_add_htlc. This
message would allow to increase or reduce the amount of an HTLC committed /
offered to a channel so that folks would not have to have parallel htlcs in
a channel that all belonged to the same payment. Maybe the channel factory
ideas on eltoo channels that  AJ Townes mentioned might be another
direction to achieve this. Also I believe similar ideas have been discussed
with link-AMP in the past.

3.) Especially when optimizing for success probabilities I believe there is
a strong argument to solve min cost flows not for the amount to be paid but
rather in a way that the expected value to be delivered matches the amount
that one wishes to send. We can achieve this by computing the optimal flow
for a larger amount and then send out redundant but cancable onions similar
to the mechanism described in the Boomerang Paper [4] which I am happy to
recommend to look at again.

4.) I want to recall that nodes probe the channels of their neighbors
anyway while delivering the payment. If we had Friend of a Friend Balance
sharing [5] together with a fee free rebalancing protocol we would reduce
the uncertainty around the sender and the recipient (to be included in
blinded path routing hints in the invoices or offers) drastically and
improving the overall speed for larger payments to arrive because we first
recipient. Additionally the prior probability changes from a uniform
distribution to a normal distribution as shown in figure 13 in the
probabilistic payment paper[6] which reduceses the necessary attempts again
as shown in figure 14 of the same paper.

5.) If we either believe it is the best to drop the base fee and/or if we
see that users stop using it we might want to think about depricating the
base fee also on a protocol level (or if I recall correctly as has been
done with the do not fragment bit / flag in IP) to reinterpret that the
semantics of the base fee.

As you can see when I was about to introduce this idea not because I was
not worried about paying the basefee several times in disection (I guess
the base fee will go away anyway) but just because for more practical
reasons about HTLC space in commitment transactions (which I think is
another excellent reason) to think more about a transport layer that
respects flows as it also addresses the fixed cost issues (not in terms of
signatures but in terms of potential on chain fees and transaction size).
That being said I am not sure how easy it will be for us to create an onion
transport to actually communicate the flow but I was arguing that maybe
with a simple new Lightning message called update_update_htlc that would
allow to update the amount of a given HTLC we could create something
similar to local payment splitting and merging . In fact this could work in
particular if an HTLC of size X arrives at a channel that has less
liquidity than X. The forwarding node could back propagate the value that
was passoible by lowering all upstream HTLCs to the liquidity the payer has
in the downstream channel (though I haven't fully thought about all the
consequences with respect to security for this yet)

with kind regards Rene

[0]:
https://lists.linuxfoundation.org/pipermail/lightning-dev/2021-August/003203.html
[1]: https://en.wikipedia.org/wiki/Convex_function
[2]: Network Flows: Theory, Algorithms, and Applications Buch von James B.
Orlin, Ravindra K. Ahuja und Thomas L. Magnanti
[3]:
https://lists.linuxfoundation.org/pipermail/lightning-dev/2019-June/002029.html
[4]: https://arxiv.org/pdf/1910.01834.pdf
[5]: https://github.com/lightningnetwork/lightning-rfc/pull/780
[6]: https://arxiv.org/abs/2103.08576

On Mon, Aug 30, 2021 at 12:58 PM ZmnSCPxj via Lightning-dev <
lightning-dev@lists.linuxfoundation.org> wrote:

> Good morning list,
>
> It seems to me that the cost function defined in Pickhard-Richter
> can in fact include a base fee:
>
>     -log(success_probability) + fudging_factor * amount * prop_fee +
> fudging_factor * base_fee
>
> (Rant: why do mathists prefer single-symbol var names, in software
> engineering there is a strong injunction **against** single-char
> var names, with exceptions for simple numeric iteration loop vars,
> I mean keeping track of the meaning of each single-symbol var name
> takes up working memory space, which is fairly limited on a wetware
> CPU, which is precisely why in software engineering there is an
> injunction against single-char var names, srsly.)
>
> It seems to me that the above is "convex", as the fudging_factor
> for a run will be constant, and the base_fee for the channel
> would also be constant, and since it seems to me, naively, that
> the paper defines "convex" as "the second derivative >= 0 for all
> amount", and since the derivative of a constant is 0, the above
> would still remain convex.
> (I am not a mathist and this conjecture might be completely asinine.)
>
> So, it seems to me that the *real* reason for #zerobasefee is
> not that the **mincostflow** algorithm cannot handle non-zero base_fee,
> but rather, the **disect** phase afterwards cannot handle non-zero
> base_fee.
>
> For example, suppose the minflowcost algorithm were instructed to
> deliver 3,000msat from S to D, and returned the following flow:
>
>     S -->3000--> A ->1000-> B
>                  |          |
>                  |        1000
>                  |          v
>                  +-->2000-> C ->3000-> D
>
> In the "disect" phase afterwards, the above flow solution would have
> to be split into two sub-payments, a 1000 sub-payment S-A-B-C-D
> and a 2000 sub-payment S-A-C-D.
>
> However, this does mean that the base cost for C->D and S->A in
> the above flow will pay *twice* the base cost than what the above
> cost function would have computed, because in reality two independent
> payments pass through those channels.
>
> Thus, the current Pickhardt-Richter scheme cannot work with non-zero
> base fees if it were modified to consider fees, but not due to the
> mincostflow algorithm, rather because due to the disect algorithm.
>
> ### Converting Non-Zero Base Fees To Proportional Fees
>
> An alternative solution would be to optimize for a *maximum* fee
> rather than optimize for the *exact* fee.
>
> We can do this by virtually splitting up the entire payment into
> smaller bunches of value, and asking mincostflow to solve individual
> bunches rather than individual msats.
>
> For example, we can decide that the smallest practical HTLC would
> be 1000 msat.
> Let us call this the "payment unit", and the mincostflow algo
> solves for paying in these payment units instead of 1msat units.
> (Actual implementations would need to have some heuristic or
> reasoned rule-of-thumb assumption on what the smallest practical
> HTLC would be.)
>
> In our example, suppose we need to send 2001msat from S to D,
> with the payment unit being 1000 msat.
> Then we would actually have the mincostflow algorithm work to
> deliver 3 payment units (3 * 1000msat) from S to D.
>
> Let us suppose that the mincostflow algorithm returns the following
> flow:
>
>     S -->3-----> A ->1----> B
>                  |          |
>                  |        1 |
>                  |          v
>                  +-->2----> C ->3----> D
>
> Actually, what the mincostflow algorithm uses as a cost function
> would be:
>
>     -log(success_probability) + fudging_factor * amount * prop_fee *
> payment_unit + fudging_factor * base_fee * amount
>
>     == -log(success_probability) + fudging_factor * amount * (prop_fee *
> payment_unit + base_fee)
>
>     where: amount is in units of 1000msat, our smallest practical HTLC.
>            payment_unit is the unit, i.e. 1000msat.
>
> What the above means is that the mincostflow algorithm *allocates*
> 3 * base_fee for C->D, since the amount flowing through
> C->D would be 3.
> However, when we pass the above flow to the disect algorithm, it
> would actually only split this into 2 sub-payments, so the
> actual payment plan would only pay 2 * base_fee for the
> C->D leg on both sub-payments.
>
> In short, this effectively converts the base fee to a proportional
> fee, removing the zerobasfee requirement imposed by the disect
> algorithm.
>
> That is, the cost computed by the mincostflow algorithm is really a
> maximum cost budget that the subsequent disect algorithm could later
> spend.
>
> In effect, this converts the base fee to a proportional fee.
>
> This may be acceptable in practice.
> This approximation has a bias against non-zerobasefee --- it would
> treat those channels as being far more expensive than they actually
> would end up being in an *actual* payment attempt --- but at least
> does not *require* zerobasefee.
> It would be able to still use non-zerobasefee channels if those are
> absolutely required to reach the destination.
>
> This should at least help create a practical payment algorithm that
> handles current LN with nonzerobasefee, and provide an impetus
> towards making zerobasefee a reality.
>
> Note that this approximation would be non-optimal --- there may
> be solutions that provide lower costs than the output of this
> approximation would provide.
>
> It may be practical to have the payment unit be adjustable by a
> higher-level (but still automated) process.
> For example, it may be considered that having more than 500 splits
> would be heuristically determined to be impractical, in which
> case the payment unit could be the actual amount divided by 500
> rounded up.
> Then if that results in extreme overestimation of base costs
> (i.e. the subsequent disect stage results in far fewer than 500
> splits) the payment unit could be adjusted upwards.
> Similarly, if that results in a payment plan that has high
> base fees, the payment unit could be adjusted downwrds.
>
> --
>
> Similarly, having particular split amounts is helpful to reduce
> the ability of intermediate nodes snooping on payment amounts,
> so having a particular fixed payment unit may also be useful
> nevertheless, even if this approximation is otherwise undesirable.
>
> That is, in the above example, we just send 3 sub-payments
> instead of the minimum 2, so that every sub-payment is always
> uniformly 1000msat.
> This leaks the least amount of information to intermediate nodes,
> especially after payment decorrelation is widely deployed.
> (this is the same principle for why a uniform unit amount is
> needed in practical CoinJoin implementations; it increases the
> anonymity set of individual sub-payments.)
>
> This would be helped greatly by payment decorrelation;
> intermediate nodes would remain uncertain that multiple
> HTLCs with the same amount and the same in-channel and
> out-channel are from the same overall payment or not.
> Assuming many software uses the same payment unit consistently,
> less information can be extracted by surveillors, and the
> increased cost may be considered justifiable in paying for
>
> --
>
> ### Whole-flow Payments
>
> Suppose we were to instead modify the LN protocol such that,
> instead of MPP having individual sub-payments that split at the
> source and join at the destination, we allow the source to
> instruct arbitrary intermediate nodes to join and/or split
> payments.
>
> That is, suppose LN allowed the source to indicate to arbitrary
> forwarding nodes to split payments and join them.
> (This would require nearly all nodes to upgrade, which may be
> impractical, but let us consider its implications first without
> committing to actually doing this.)
>
> A node that is a join node would be instructed to wait for all
> incoming HTLCs to arrive before they create any outgoing
> HTLCs.
> A node that is a split node would make more than one outgoing
> HTLCs.
> Forwarding nodes can be both join and split nodes.
>
> This would allow the output of the mincostflow algorithm to be
> used directly, without the problematic disect stage.
> Nodes that have more than one in-flow would be join nodes,
> while nodes with more than one out-flow would be split nodes.
>
> Returning to the example:
>
>     S -->3000--> A ->1000-> B
>                  |          |
>                  |        1000
>                  |          v
>                  +-->2000-> C ->3000-> D
>
> In the above, S sends a *single* payment out on channel
> S->A, and instructs A to split the payment.
> Then S also instructs C to join two payments, resulting
> in a single HTLC out on C->D if C receives on both
> A->C and B->C.
>
> The effect is that, for a single payment plan (presented as a
> flow, outputted by a mincostflow algorithm) for each channel
> with nonzero flow, there would only be one HTLC that gets paid
> for using a single base_fee.
>
> Then, the base fee computed by the cost function would be exact
> and there would be no underestimation of the base fee for
> non-zerobasefee channels.
>
> Let us then consider the practicality of this scheme.
>
> If a hop were to fail to deliver, then some join nodes would
> be left hanging, waiting for an incoming HTLC that will not
> arrive.
>
> The failure would have to be handled by a split node, when it
> receives a update_fail_htlc.
> That split node cannot propagate the failure backwards to the
> source, since it would still have other outgoing HTLCs in
> play, which it cannot safely ignore; it can only propagate
> the failure backwards if all the other outgoing HTLCs also
> fail.
>
> To support this, a split node could give a request-to-fail
> signal to its other outgoing HTLCs.
> Then, any other join nodes still waiting for an incoming
> HTLC (and would not yet have created any outgoing HTLCs,
> or for the destination, would not have claimed any incoming
> HTLCs) would fail all its incoming HTLCs.
> Otherwise, an intermediate node receiving a request-to-fail
> signal from any incoming HTLCs would simply propagate
> this request-to-fail outwards to all its outgoing HTLCs.
>
> This implies that for this scheme, if *any* hop fails, the
> entire payment fails and has to be restarted "from the top".
>
> This is in contrast with the current scheme of splitting
> only at the source and joining only at the destination.
> In the current LN, if *any* hop of an MPP fails, only the
> sub-payment(s) going through that hop will need to be
> retried.
> Other sub-payment(s) can keep being "in play", and the
> source, having all the information needed, only needs to
> recompute the failing outgoing payments.
>
> On the other hand, the same information (particular hops are
> failing) would be propagated back to the source, and the
> source having to recompute the entire flow rather than just
> some subset of the payments is not much worse computationally
> (work is still dominated by O((m*m+m*n)*log n) where m is
> the channels of the *entire network* and n is the nodes
> of the *entire network*, amount is just a piddling little
> O(log(u)) term).
> It also removes the need to virtually reduce the capacity of
> intermediate hops while other sub-payments are in play
> (however, a practical implementation would need to consider
> the possibility of multiple *simultaneous* outgoing payments
> anyway and would still retain this reduction of capacity in
> case a new parallel outgoing payment is triggered while one
> is still ongoing).
>
> This leads us to some fairly interesting choices:
>
> * [ ] Whole-flow payment is a brilliant idea and we should
>   totally do it.  #deletethezerobasefeedebate
> * [ ] Whole-flow payment is a horrible idea and we should
>   totally not do it.
>   * [ ] As we can see, the LN community needs to get together
>     for better payment success, and supporting whole-flow
>     payments is worse than setting basefee=0. #zerobasefee
>   * [ ] As we can see, the current LN provides a vital service
>     (partial failure of MPP) and intermediate nodes need to be
>     compensated for this service.  #nonzerobasefee
>
> Regards,
> ZmnSCPxj
> _______________________________________________
> Lightning-dev mailing list
> Lightning-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
>

--
https://www.rene-pickhardt.de

_______________________________________________
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev