Re: [PATCH -next] locking/rwsem: don't spin in heavy contention

Davidlohr Bueso Fri, 06 Mar 2015 14:27:00 -0800

On Sat, 2015-03-07 at 08:54 +1100, Dave Chinner wrote:
> On Fri, Mar 06, 2015 at 11:13:10PM +0800, Ming Lei wrote:
> > Before commit b3fd4f03ca0b995(locking/rwsem: Avoid deceiving lock
> > spinners), rwsem_spin_on_owner() returns false if the owner is changed.
> > This commit just returns true under the situation, then kernel
> > softlock can be triggered easily in xfstest.
> > 
> > So this patch recovers to previous behaviour, and it should be
> > reasonable to stop spining in case of heavy contention.
> > 
> > The soft lockup can be reproduced easily in xfstests(generic/299)
> > over ext4:
> > 
> > [  236.417011] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! 
> > [kworker/5:80:3288]
> > [  236.417011] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw
> > [  236.417011] CPU: 5 PID: 3288 Comm: kworker/5:80 Not tainted 
> > 4.0.0-rc1-next-20150303+ #69
> > [  236.417011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > Bochs 01/01/2011
> > [  236.417011] Workqueue: dio/sda dio_aio_complete_work
> > [  236.417011] task: ffff8800b87c0000 ti: ffff8800b703c000 task.ti: 
> > ffff8800b703c000
> > [  236.417011] RIP: 0010:[<ffffffff81083c20>]  [<ffffffff81083c20>] 
> > __rcu_read_unlock+0x47/0x55
> > [  236.417011] RSP: 0018:ffff8800b703fb98  EFLAGS: 00000246
> > [  236.417011] RAX: 0000000000000000 RBX: ffff8800b703fb48 RCX: 
> > 000000000003b080
> > [  236.417011] RDX: fffffffe00000001 RSI: ffff880231f03a20 RDI: 
> > ffff8800bb755568
> > [  236.417011] RBP: ffff8800b703fba8 R08: ffff880227908078 R09: 
> > ffff8800b87c0000
> > [  236.417011] R10: 0000000000000001 R11: 0000000000000020 R12: 
> > ffff8800b703c000
> > [  236.417011] R13: ffff8800b87c0000 R14: 000000000000000f R15: 
> > 0000000000000101
> > [  236.417011] FS:  0000000000000000(0000) GS:ffff88023eca0000(0000) 
> > knlGS:0000000000000000
> > [  236.417011] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [  236.417011] CR2: 00007f549369f948 CR3: 00000000ba891000 CR4: 
> > 00000000000007e0
> > [  236.417011] Stack:
> > [  236.417011]  fffffffe00000001 ffff8800bb755568 ffff8800b703fbc8 
> > ffffffff81073917
> > [  236.417011]  ffff8800bb755568 ffff8800bb755584 ffff8800b703fc48 
> > ffffffff814d1ba4
> > [  236.417011]  ffff8800b703fbe8 ffff8800b87c0000 ffff8800b703fc78 
> > ffffffff811cf51b
> > [  236.417011] Call Trace:
> > [  236.417011]  [<ffffffff81073917>] rwsem_spin_on_owner+0x2b/0x79
> > [  236.417011]  [<ffffffff814d1ba4>] rwsem_down_write_failed+0xc0/0x2f1
> > [  236.417011]  [<ffffffff811cf51b>] ? start_this_handle+0x494/0x4bd
> > [  236.417011]  [<ffffffff810d1149>] ? trace_preempt_on+0x12/0x2f
> > [  236.417011]  [<ffffffff812ae8f3>] call_rwsem_down_write_failed+0x13/0x20
> > [  236.417011]  [<ffffffff814d1623>] ? down_write+0x24/0x33
> > [  236.417011]  [<ffffffff81199404>] ext4_map_blocks+0x236/0x3cb
> > [  236.417011]  [<ffffffff811bb407>] ? 
> > ext4_convert_unwritten_extents+0xd2/0x19c
> > [  236.417011]  [<ffffffff811bcae4>] ? __ext4_journal_start_sb+0x77/0xb8
> > [  236.417011]  [<ffffffff811bb42e>] 
> > ext4_convert_unwritten_extents+0xf9/0x19c
> > [  236.417011]  [<ffffffff8119e214>] ext4_put_io_end+0x3a/0x5d
> > [  236.417011]  [<ffffffff81197268>] ext4_end_io_dio+0x2a/0x2c
> > [  236.417011]  [<ffffffff8116418c>] dio_complete+0x97/0x12d
> > [  236.417011]  [<ffffffff81164333>] dio_aio_complete_work+0x21/0x23
> 
> If you're getting stuff there, I'd be looking for a bug in ext4, not
> the rwsem code. There's no way there should be enough unwritten
> extent conversion pending to lock up the system for that length of
> time. Especially considering the test has concurrent truncates
> running which should drain the entire IO queue every couple of
> seconds at worst....


FYI, this issue is being handled here:
https://lkml.org/lkml/2015/3/6/811

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -next] locking/rwsem: don't spin in heavy contention

Reply via email to