Good morning Joost, > Yes that is accurate, although using the time difference between receiving > the `update_add_htlc` and sending back the `update_fail_htlc` would work too. > It would then include the node's processing time.
It would not work safely. A node can only propagate an `update_fail_htlc` if the downstream `update_fail_htlc` has been irrevocably committed by `revoke_and_ack`. See BOLT spec about this. Suppose we have a route A -> B -> C. C sends `update_fail_htlc` immediately, but dallies on `revoke_and_ack`. B cannot send `update_fail_htlc` to A yet, because C can still drop the previous B-C channel state onchain (it is not yet revoked, that is what the `revoke_and_ack` will later do). If B send `update_fail_htlc` to A as soon as it receives `update_fail_htlc` from C, A can use the new A-B channel state onchain, while at the same time C drops the previous B-C channel state onchain. the new A-B channel state returns the HTLC to A, while the previous B-C channel state has the HTLC still claimable by C, causing B to lose funds. For `update_fulfill_htlc` B can immediately propagate to A (without waiting for `update_and_ack` from C) since C is already claiming the money. Since, B cannot report the `update_fail_htlc` immediately, its timer should still be running. Suppose we counted only up to `update_fail_htlc` and not on the `revoke_and_ack`. If C sends `update_fail_htlc` immediately, then the `update_add_htlc`->`update_fail_htlc` time reported by B would be fast. But if C then does not send `revoke_and_ack`, B cannot safely propagate `update_fail_htlc` to A, so the time reported by A will be slow. This sudden transition of time from A to B will be blamed on A and B, while C is unpunished. That is why, for failures, we ***must*** wait for `revoke_and_ack`. The node must report the time when it can safely propagate the error report upstream, not the time it receives the error report. For payment fulfillment, `update_fulfill_htlc` is fine without waiting for `revoke_and_ack` since it is always reported immediately upstream anyway. See my discussion about "fast forwards": https://lists.linuxfoundation.org/pipermail/lightning-dev/2019-April/001986.html > I think we could indeed do more with the information that we currently have > and gather some more by probing. But in the end we would still be sampling a > noisy signal. More scenarios to take into account, less accurate results and > probably more non-ideal payment attempts. Failed, slow or stuck payments > degrade the user experience of lightning, while "fat errors" arguably don't > impact the user in a noticeable way. Fat errors just give you more information when a problem happens for a "real" payment. But the problem still occurs on the "real" payment and user experience is still degraded. Background probing gives you the same information **before** problems happen for "real" payments. Regards, ZmnSCPxj _______________________________________________ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev