Re: BTRFS deadlock (btrfs_join_transaction?)

Zach Brown Tue, 22 Jan 2013 11:52:01 -0800

> kernel: 3.7.2 from kernel.org

> [14501.689372] BUG: soft lockup - CPU#2 stuck for 22s!
> [14501.689446] CPU 2
> [14501.689452] Pid: 29021, comm: btrfs-delayed-m Not tainted
> 3.7.2-custom2 #1 Intel Corporation S2600IP/S2600IP
> [14501.689455] RIP: 0010:[<ffffffff81044ab5>] [<ffffffff81044ab5>]
> __ticket_spin_lock+0x25/0x30


So stuck spinning on a spinlock.

> [14501.689523] Call Trace:
> [14501.689533]  [<ffffffff816a0b6e>] _raw_spin_lock+0xe/0x20
> [14501.689560]  [<ffffffffa018db85>] join_transaction.isra.26+0x25/0x370 
> [btrfs]

Probably the first trans_lock in join_transaction().

> exact same message repeats 28 seconds later, and then it is followed
> by: pastebin.com/349ikn0c

All 16 cpus have traces in that dump and only this stuck CPU's seems
interesting. 

> Any ideas?

It doesn't look like there's any easy answers in the code: no unbalanced
lock and unlocks and nothing scary done while holding the lock.  (Some
list traversal, but the traces don't show another cpu stuck spinning on
a corrupt list).

If I had to guess, I'd guess that the lock got corrupted somehow.  Maybe
a race that has delayed work run on a freed structure.

Would it be possible to enable some debugging options in the kernel
you're building?   DEBUG_LIST, DEBUG_SPINLOCK, and the various lockdep
options (DEBUG_LOCKDEP, PROVE_LOCKING) might raise an alarm that would
shed some light.  Hopefully they wouldn't be unusably slow.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS deadlock (btrfs_join_transaction?)

Reply via email to