Re: [Lightning-dev] Improve Lightning payment reliability through better error attribution

ZmnSCPxj via Lightning-dev Fri, 14 Jun 2019 19:53:56 -0700

Good morning Joost,

> Yes that is accurate, although using the time difference between receiving 
> the `update_add_htlc` and sending back the `update_fail_htlc` would work too. 
> It would then include the node's processing time.


It would not work safely.
A node can only propagate an `update_fail_htlc` if the downstream 
`update_fail_htlc` has been irrevocably committed by `revoke_and_ack`.
See BOLT spec about this.

Suppose we have a route A -> B -> C.
C sends `update_fail_htlc` immediately, but dallies on `revoke_and_ack`.
B cannot send `update_fail_htlc` to A yet, because C can still drop the 
previous B-C channel state onchain (it is not yet revoked, that is what the 
`revoke_and_ack` will later do).
If B send `update_fail_htlc` to A as soon as it receives `update_fail_htlc` 
from C, A can use the new A-B channel state onchain, while at the same time C 
drops the previous B-C channel state onchain.
the new A-B channel state returns the HTLC to A, while the previous B-C channel 
state has the HTLC still claimable by C, causing B to lose funds.

For `update_fulfill_htlc` B can immediately propagate to A (without waiting for 
`update_and_ack` from C) since C is already claiming the money.

Since, B cannot report the `update_fail_htlc` immediately, its timer should 
still be running.
Suppose we counted only up to `update_fail_htlc` and not on the 
`revoke_and_ack`.
If C sends `update_fail_htlc` immediately, then the 
`update_add_htlc`->`update_fail_htlc` time reported by B would be fast.
But if C then does not send `revoke_and_ack`, B cannot safely propagate 
`update_fail_htlc` to A, so the time reported by A will be slow.
This sudden transition of time from A to B will be blamed on A and B, while C 
is unpunished.

That is why, for failures, we ***must*** wait for `revoke_and_ack`.
The node must report the time when it can safely propagate the error report 
upstream, not the time it receives the error report.
For payment fulfillment, `update_fulfill_htlc` is fine without waiting for 
`revoke_and_ack` since it is always reported immediately upstream anyway.

See my discussion about "fast forwards": 
https://lists.linuxfoundation.org/pipermail/lightning-dev/2019-April/001986.html

> I think we could indeed do more with the information that we currently have 
> and gather some more by probing. But in the end we would still be sampling a 
> noisy signal. More scenarios to take into account, less accurate results and 
> probably more non-ideal payment attempts. Failed, slow or stuck payments 
> degrade the user experience of lightning, while "fat errors" arguably don't 
> impact the user in a noticeable way.

Fat errors just give you more information when a problem happens for a "real" 
payment.
But the problem still occurs on the "real" payment and user experience is still 
degraded.

Background probing gives you the same information **before** problems happen 
for "real" payments.

Regards,
ZmnSCPxj
_______________________________________________
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Improve Lightning payment reliability through better error attribution

Reply via email to