[Lightning-dev] Proof-of-closure as griefing attack mitigation

ZmnSCPxj via Lightning-dev Tue, 31 Mar 2020 23:19:38 -0700

Introduction
============

Given the fact that contracts on offchain protocols need to be enforceable 
onchain as well, timelocks involved in multi-hop payments are measured in 
blocks.
This is because the blockchain can only (third-party-verifiably) enforce 
timeouts in units of entire blocks.
This leads to very long timeouts for payment delivery, thus multi-hop offchain 
payment attempts can be, deliberately or accidentally, be in a "pending" state 
up to the very large timeouts involved.


Deliberately setting up a multi-hop payment such that it will be in a "pending" 
state for long periods of time is colloquially known as a "griefing attack".
In this article, we assess various proposed solutions to mitigate the effects 
of griefing attacks, and propose a particular solution, proof-of-closure, as 
well, that requires significant changes to the update state machine.

Digression: Why Grief?
======================

Before embarking on our investigation for solutions to the griefing problem, we 
should first wonder if griefing is, in fact, a problem.

This brings up the question of: why would anybody grief at all?

Humans, like cats and other less-sapient pieces of walking meat, often find 
enjoyment in causing the suffering of others for no immediate direct gain to 
themselves, as a public demonstration of dominance over those they make suffer 
(aka "shits and giggles", which, if executed correctly, can lead to eventual 
direct gains to themselves or their progeny or relatives or allies, but such 
details are often outside the ken of the very beings who execute such survival 
strategies: brains are pieces of meat that have been hacked to act as 
action-reaction engines, but are not sophisticated enough to execute as pure 
rationality engines at all times).
Fortunately, in the Bitcoin world, only purely rational beings of pure 
selfishness can exist in the long run, thus we can neglect such motivations as 
mere noise.

First, let us investigate *how* griefing attacks can be performed.

* An intermediate node in a multi-hop attempt can delay forwarding or failing 
an incoming HTLC.
* A final node in a payment attempt can delay claiming an incoming HTLC.

Let us consider a purely rational intermediate node of pure selfishness:

* If it forwards as soon as possible, it can earn fees, and also speed up the 
release of the HTLC-locked funds so that they can reuse those funds as 
liquidity for further payment attempts.
* Thus, delaying an HTLC is not selfishly-rational for an intermediate node.

Thus, for an intermediate node, it seems there is no selfishly-rational 
motivation to execute a griefing attack on an arbitrary payment attempt.
We can then conclude that an intermediate that delays a payment would do so, 
not of its own rational self-interest, but as an accident, such as an 
unforeseen connectivity or power failure.

However, things are different when we consider a non-arbitrary payment.
Suppose a node were to make a payment attempt to itself, and deliberately delay 
claiming this self-payment.
This lets any single node, *who happens to own large liquidity*, to lock up the 
liquidity of other nodes.

The motivation to lock up the liquidity of other nodes is to *eliminate 
competition*.
Suppose we have a network as below:

    A -- B -- C
      \     /
       \   /
        \ /
         E

When A and C want to transact with one another, they may choose to route via 
either B or E.
B and E are therefore competitors in the business of forwarding payments.

But suppose E has much larger channels AE and CE than the channels of AB and CB.
For example, suppose E has 100mBTC perfectly-balanced channels while B has only 
10mBTC perfectly-balanced channels, as all things should be in simplified 
models of reality.
E can then "take out the competition" by making a 5mBTC self-payment along 
E->A->B->C->E and a 5mBTC self-payment along E->C->B->A->E, then refusing to 
claim the payment, tying up all the liquidity of the channels of B.
By doing so, it can ensure that A and C will always fail to pay via B, even if 
they wish to transact in amounts less than 5mBTC.
E thereby eliminates B as a competitor.

This demonstrates that griefing attacks will be motivated, such that such 
attacks will be performed by payers and payees *against intermediate nodes*.
Intermediate nodes have no motivation to attack payers and payees (those are 
their potential customers in the business of forwarding payments, and attacking 
potential customers is bad business: such attacking intermediate nodes will be 
removed economically in the long run).
However, payers and payees can become motivated to attack intermediate nodes, 
if the "payer" and "payee" are actually competitor intermediate nodes.

(We can observe that this is always a possibility even outside of Lightning: a 
service or product provider has no incentive to attack its customers ("the 
customer is always right"), but have an incentive to *pretend* to be a customer 
of a competitor and attack them.)

We will keep this fact in mind: active griefing attacks are attacks *on* 
intermediate nodes, not *by* intermediate nodes, because there is no economic 
incentive for intermediate nodes to attack their customers.

Previous Proposed Solutions
===========================

Time-Spent Reporting
--------------------

At each channel along the route, the time spent by a node to handle its 
forwarding is recorded, and reported upstream in the route.

Unfortunately, this solution protects payers from intermediate nodes and 
payees: it does not protect intermediate nodes from colluding payers and payees.
Even if an intermediate node knows that a particular node is consistently slow 
via a previous time-spent report, it will not be able, with our current onion 
routing, determine if an onion packet it just received will or will not go 
through the known-slow node.
Thus, an intermediate node would not be able to defend against distant payees 
that, with a colluding payer, will not claim a particular payment.

As we have established, an active griefing atttack will never be deliberately 
performed by a selfishly-rational intermediate node.
Thus, this solution protects against the wrong thing: it protects payers 
against slow/unreliable intermediate nodes, it does not protect intermediate 
nodes against malicious payer/payee collusions.
It protects only against intermediate nodes that inadvertently go offline 
during forwarding, but such nodes will inevitably lose out on the forwarding 
market anyway, and will disappear in the long run.

Up-Front Payment
----------------

Payers pay for an attempt, not just the successful completion of an attempt.

A variation on this is that the payer (or payee) continuously pays as long as 
the payment is pending.
Further variations include paying by other means, such as just locking funds or 
paying with proof-of-work.

While it certainly erects economic barriers against payer/payee collusions 
attacking intermediate nodes, it *also* erects economic barriers against 
normal, non-malicious payments.

We can consider that economic barriers against non-malicious, low-value, 
high-frequency payments ("micropayments") may be enough that such payments 
become infeasible if we impose up-front payment for mere attempts.
Thus, while this solution is certainly something we can consider, we must be 
reluctant to use it due to its up-front, strict-evaluation behavior.

Proof-Of-Closure
================

Observing the above, we want the properties for a "good" solution to griefing 
attacks to be:

* It should protect intermediate nodes against payer/payee collusions.
* It should only come into play upon detection of an attack.

We now present proof-of-closure, which (we hope) has the above properties.

We can consider instead a softer timeout, distinct from the HTLC block-based 
timeout.
This softer timeout is measurable in fractions of a second, e.g. units of 0.1 
seconds.

Each node on the network advertises, in addition to a block-based `cltv_delta`, 
a `timeout_delta` in units of 0.1 seconds.
Further, each invoice contains, in addition to a block-based `final_cltv`, a 
`final_timeout` in units of 0.1 seconds.

Thus, there are two timeouts:

* The current "hard" block-based timeout that is enforceable onchain.
* A new "soft" sidereal-time-based timeout that is not onchain enforceable.

The soft timeout, as mentioned, is not enforceable onchain.
Instead, enforcement of the soft timeout *is* the act of putting the channel 
state onchain.

Now, for the current "hard" block-based timeout, we already have a reaction.
If the HTLC "hard" timeout is approaching:

* Drop the channel onchain and enforce the hard timeout onchain to reclaim the 
funds in the HTLCs.
* Wait for the onchain action to be deeply resolved (either timelock or 
hashlock branch is confirmed deeply) and report the result (success or fail) 
upstream.

What happens if the "soft" timeout is violated?

* Drop the channel onchain.
* Report the channel closure upstream.

The "hard" timeout is cancelled in any of these two conditions:

* A success is reported via `update_fulfill_htlc`, OR,
* A failure is reported via `update_fail_htlc` AND the HTLC is irrevocably 
removed from the latest commitments/state(s) of the channel.

The "soft" timeout is cancelled in any of these three conditions, the first two 
of which are the same as above:

* A success is reported via `update_fulfill_htlc`, OR,
* A failure is reported via `update_fail_htlc` AND the HTLC is irrevocably 
removed from the latest commitments/state(s) of the channel, OR
* A channel closure is reported.

Let us fill this in more detail.

Suppose we have a payment route A->B->C->E.

Both the "hard" block timeouts and the "soft" second timeouts decrement 
monotonically at each hop.
Thus, the payee E has the shortest "hard" and "soft" timeouts (as normal).

* Suppose E then delays claiming the payment and violates the "soft" timeout.
* C then drops the CE channel onchain.
* C reports, before its own timeout (slightly larger than the timeout imposed 
on E), the closing of the channel CE, to B.
* B validates this report, and if valid, propagates the report to A.
* A validates this report, and if valid, accepts that the payment will be 
"stuck" for up to the hard timeout it imposed on B.

C has to report back to B in order to prevent B from closing the BC channel, 
and B has to report back to A in order to prevent A from closing the AB channel.
The decrementing seconds-unit timeouts are needed for each hop, for the same 
reason that decrementing block-unit timeouts are needed.

Since E is motivated to attack intermediate nodes because it wants to redirect 
payment forwards through itself rather than its competitotrs, having one of its 
channels closed (which prevents it from being used for forwarding) is directly 
opposed to its end goal of getting more money, thus, we can believe the action 
of closing a channel involved in a griefing attack is sufficient disincentive.

The major drawback is that enforcement of the soft timeout *is* a channel 
closure, which is generally a negative for the network.
This is not a remote attack vector, since a node can only trigger this closure 
if it is able to stall the fulfillment or failure of an HTLC on a channel, 
which generally means the node triggering this closure can only do so for its 
own channels (or it is able to, via a separate mechanism, remotely crash a 
different node).

Proving Channel Closes
----------------------

What C *really* needs to prove is that:

* It is *willing* to close a channel due to a violation of the soft timeout.
* The channel it is willing to close was, in fact, involved in the same payment 
attempt.

With the above, B can believe that C was innocent of wrongdoing, because:

* C would only be wiling to close a channel in case of a protocol violation, in 
this case, a violation of the soft timeout.
* The channel it closed was closed because of this payment attempt, and not 
because of another payment attempt, or some other unrelated channel being 
unilaterally closed.

First, what C needs to prove is *NOT*, in fact, actual channel closure: it 
needs to prove a *willingness* to close a channel.
Thus, it does not require the channel to actually be *closed* yet, i.e. it does 
not have to wait for onchain activity that the channel closure is in a mempool 
and is confirmed deeply onchain etc etc.

Thus, to prove a *willingness to close* rather than an actual close, C can 
provide the unilateral close of the channel CE.
The act of unilaterally closing a channel is the publication of the 
transaction(s) making up the unilateral close.
Thus, if C is *willing* to close the channel, it is willing to publish the 
transaction(s) involved, and thus, providing the unilateral close to B and 
further upstream, shows a willingness to close the channel.

B then validates the provided proof-of-closure by checking that the unilateral 
close transaction is either onchain, in the mempool, or that it spends a TXO 
that is not currently spent by another transaction.
In the case the unilateral close transaction is not confirmed and in the 
mempool, B can speed up its propagation on the Bitcoin layer by putting it in 
its own mempool as well --- after all, C is willing to close the channel to 
exonerate itself and punish the actual culprit, and B putting the unilateral 
close in its own mempool can only help C in what it is willing to do.

Secondly, C needs to prove that the channel it is willing to close involves the 
payment attempt, and is not some other channel closure that it is attempting to 
use to fulfill its own soft timeout.
Since the unilateral close transaction *is* the proof-of-closure, B (and A) can 
inspect the transaction outputs and see (with some additional data from C) that 
one of the outputs is to an HTLC that matches the payment hash.

Thus, B (and A) can believe that the proof-of-closure proves that whoever is 
presenting it is free of wrongdoing, as whoever is actually causing the delay 
has been punished (by someone being willing to close a channel with the 
culprit), and that the proof-of-closure commits to this particular payment 
attempt and no other (because it commits to a particular payment hash).

Further, if CE is closed by E dropping it onchain rather than C, C will still 
be able to fulfill its own soft timeout by taking the closing transaction from 
E, which should still contain the HTLC.
Indeed, neither A nor B will particularly care (nor need to know) who dropped 
the channel onchain, or (for A) that the channel participants are C and E.

Update State Shenanigans
------------------------

Bitcoin update mechanisms are complicated things, and it may be possible for an 
attacking payee E to fool around with the update state machine to make it 
difficult for C to report a willingness to close CE.

In particular, I quote here the relevant passages from `lightning-rfc`, 
`02-peer-protocol.md`, which is an implementation of the Poon-Dryja update 
mechanism:

> Thus each update traverses through the following states:
>
> 1. pending on the receiver
> 2. in the receiver's latest commitment transaction
> 3. ... and the receiver's previous commitment transaction has been revoked,
>    and the update is pending on the sender
> 4. ... and in the sender's latest commitment transaction
> 5. ... and the sender's previous commitment transaction has been revoked

The payee E is the "receiver" in this context.

In this case, once the update has reached step 2, then E has a commitment 
transaction that it can put onchain, that contains an HTLC it can claim.
>From this step onward, C cannot send a failure (i.e. it cannot send back an 
>`update_fail_htlc`) back to B, because E could drop its latest commitment 
>onchain and claim the HTLC onchain.

However, until step 4, C does not have a unilateral close containing the HTLC, 
and thus cannot provide a proof-of-closure that contains an HTLC that refers to 
the payment.

Thus, between steps 2 to 4, C cannot safely respond to its own soft timeout.
C cannot respond with a failure, as E could then drop its latest commitment 
transaction onchain and claim the payment from C, and extract money from C that 
way.
C also cannot respond with a proof-of-closure, as it does not have a 
transaction that it can use to provide this proof.

The best that C can do would be to impose an even shorter timeout between steps 
2 and 4 above, and to drop its current commitment transaction (which does not 
contain the HTLC yet and thus does not constitute a valid proof-of-closure) 
onchain.
In between the time it drops the commitment transaction and its own incoming 
soft timeout, there is a chance, however small, that this transaction will be 
confirmed, and the channel will (with high probability) settle in a state where 
the HTLC is not instantiated, thus C can safely fail its incoming HTLC (not 
show a proof-of-closure, since that is not possible for C to do) without risk 
of loss, just prior to its own soft timeout.

Of course, C is still at risk here: E could collude with miners via a 
side-channel fee offer to confirm its commitment transaction with the HTLC 
present, and ensure that C is liable for the HTLC value.

With Decker-Russell-Osuntokun, we can remove this risk by requiring a ritual as 
follows:

1.  C requests exclusive access to update their single shared state.
  * This can be done via a variety of sub-protocols, including a fair coin toss 
in case of near-simultaneous requests for exclusive locks on both sides.
2.  C provides the details of the new HTLC to E.
3.  C and E generate the new state transaction and exchange signatures for it.
4.  C and E generate (without signing) the new update transaction.
5.  E provides the signature (or share of signature, if MuSig) for the new 
update transaction to C.
6.  C provides the signature for the new update transaction to E, which 
releases the exclusive lock on the shared state atomically with the 
finalization of the new update transaction.

Prior to step 5, C can simply fail the incoming HTLC from B in case its own 
soft timeout is near.
Even if E performs step 5 after C has already failed the incoming HTLC from B, 
C can simply not perform step 6 and drop the channel onchain with the previous 
update and state transactions.

With Poon-Dryja, we will have to rearrange the order in which we perform 
things, effectively adding an extra communications turnaround between the 
participants.
Specifically, the order would have to be revised to:

> 1. pending on the sender
> 2. in the sender's latest commitment transaction
> 3. ... and the sender's previous commitment transaction has been revoked,
>    and the update is pending on the receiver
> 4. ... and in the receiver's latest commitment transaction
> 5. ... and the receiver's previous commitment transaction has been revoked

This allows the sender (C in our context) to provide a proof-of-closure after 
step 2, and before step 2, C can safely return a failure with 
`update_fail_htlc` (and refuse to proceed beyond step 2, thus ensuring it can 
still use the previous commitment that still has no HTLC).

Of course, this change will require redesigning the update state machine, 
increasing the number of communication turnarounds, and creating a subtle 
incompatbility when transitioning a payment from a hop that knows only the old 
update state machine to a hop that knows the new update state machine.

Purely Falsified Proof-Of-Closure
---------------------------------

Of course, the attacking node E might want to create a false proof-of-closure.
E can do this by simulating a Lightning channel: lock an amount of funds in a 
2-of-2 (where E controls both keys), then spend it in a set of transactions 
mimicking the unilateral close.

We observe, however, that the overhead of simulating a Lightning channel is the 
same as the overhead of actually creating and closing a Lightning channel.
Since the punishment of proof-of-closure is to force attackers to have their 
channels closed, we can consider that this simulation of a channel open and 
close is sufficient as well.

Future-Proofing
---------------

This sketch of proof-of-closure can be used for any update mechanism:

* With Poon-Dryja, C can use its own commitment transaction as the 
proof-of-closure.
* With Decker-Wattenhofer, C can give all the offchain transactions up to the 
last stage in the multi-stage decrementing-`nSequence` mechanism.
* With Deckker-Russell-Osuntokun, C can give the latest update and state 
trnsaction.

Basically, we expect that for now, and in the future, any update mechanism 
worth consideration will have a concept of "unilateral close" where a channel 
can be dropped onchain, using data that only one of the channel participants 
holds.

Such a unilateral close will be a sequence of one or more valid transactions, 
terminating in a transaction containing an HTLC-like contract in one of its 
outputs.

Thus, to validate the unilateral close, it is only required to validate all the 
transactions contained in the proof-of-closure, and see that the last 
transaction has an HTLC output.

The limitations are thus:

* The acceptable forms of HTLC would need to be agreed-upon by the entire 
network.
* Implementations would need to be able to assess, in a 
Bitcoin-consensus-compatible way, whether a transaction is valid or not.

Payment Decorrelation and Payment Points
----------------------------------------

Of course, having a single payment hash for the entire payment attempt is a 
privacy loss, which we intend to fix in the near future by using payment 
points, and adding a blinding scalar at each hop, aka. payment decorrelation.

Thus, in the future, there will not be any HTLC, but instead a PTLC.
Further, the payment point at each hop will be changed at each hop, in order to 
prevent decorrelation.

Thus, C needs to provide proofs:

* That an apparent singlesig on the unilateral close output is in fact a PTLC.
  C needs to provide:
  * A target point P.
  * A partial signature that would spend that singlesig for a particular 
sighash.
  * An adaptor signature which, with knowledge of the completed signature, 
adaptor signature, and sighash message, would have revealed the scalar behind P.
* That the PTLC belongs to the same payment attempt as what B offered to C.
  C needs to provide:
  * The C-only blinding factor that is the difference between the payment point 
of the B-to-C PTLC and the C-to-E PTLC on the unilateral close.

Then, when B needs to propagate the proof-of-closure back to A, B simply adds 
its own blinding factor to the reported blinding factor, in order to convince A 
that this is the same payment attempt.

As we have brought up privacy, we observe that, when this mechanism triggers, 
there is a mild privacy loss, in that intermediate nodes now know some channel 
closure that is related to this payment, and can thus determine the exact path 
that the payment attempt went through, at least until the channel being closed.
However, proof-of-closure is only propagated in case of violation of the soft 
timeout, so for normal non-malicious payments, proof-of-closure does not cause 
any privacy loss.
_______________________________________________
Lightning-dev mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

[Lightning-dev] Proof-of-closure as griefing attack mitigation

Reply via email to