On Oct 11, 2024, at 00:20, Kent Overstreet <[email protected]> wrote:
> 
> On Thu, Oct 10, 2024 at 09:21:50PM GMT, Alan Huang wrote:
>> If the transaction chose itself as a victim before and restarted, it
>> might request a no fail lock request this time. But it might be added to
>> others' lock graph and be chose as the victim again, it's no longer safe
>> without additional check. We can also convert the cycle detector to be
>> fully RCU-based to solve that unsoundness, but the latency added to trans_put
>> and additional memory required may not worth it.
> 
> This is going to need ascii art diagrams...

Assume (txn represents transaction):
        
        txn1 owns lock1 and requests lock2
        txn2 owns lock2 and requests lock1

txn2 runs cycle detector first, so it has added itself to lock1’s wait list,
txn1 now requests lock2, since txn2 haven’t lock lock2’s waitlist lock, so txn1 
added itself to lock2’s waitlist.
txn1 now runs cycle detector and detects it will have a cycle and choose txn2 
as the victim.
txn2 now also detects a cycle, and choose itself as the victim, so it restarts.
txn2 now requests a no fail lock and runs cycle detector again and sleep
txn1 sets txn2’s lock_must_abort to true and wake up txn2.

So txn2 is going to abort.

> 
> I would expect the lock waitlist locks and sequence numbers to protect
> against this kind of race, do they not?
> 
>> 
>> Signed-off-by: Alan Huang <[email protected]>
>> ---
>> fs/bcachefs/btree_locking.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/fs/bcachefs/btree_locking.c b/fs/bcachefs/btree_locking.c
>> index bf25fd0a1cca..fff176382e30 100644
>> --- a/fs/bcachefs/btree_locking.c
>> +++ b/fs/bcachefs/btree_locking.c
>> @@ -293,7 +293,7 @@ int bch2_check_for_deadlock(struct btree_trans *trans, 
>> struct printbuf *cycle)
>> 
>> g.nr = 0;
>> 
>> - if (trans->lock_must_abort) {
>> + if (trans->lock_must_abort && !trans->lock_may_not_fail) {
>> if (cycle)
>> return -1;
>> 
>> -- 
>> 2.45.2
>> 


Reply via email to