On 01/22/2016 06:10 AM, Dmitry Vyukov wrote:
> On Thu, Jan 21, 2016 at 6:51 PM, Peter Hurley <[email protected]> 
> wrote:
>> On 01/21/2016 02:20 AM, Peter Zijlstra wrote:
>>> On Thu, Jan 21, 2016 at 11:06:45AM +0100, Dmitry Vyukov wrote:
>>>> On Wed, Jan 20, 2016 at 5:08 PM, Peter Hurley <[email protected]> 
>>>> wrote:
>>>>> On 01/20/2016 05:02 AM, Peter Zijlstra wrote:
>>>>>> On Wed, Dec 30, 2015 at 11:44:01AM +0100, Dmitry Vyukov wrote:
>>>>>>> -> #3 (&buf->lock){+.+...}:
>>>>>>>        [<ffffffff813f0acf>] lock_acquire+0x19f/0x3c0 
>>>>>>> kernel/locking/lockdep.c:3585
>>>>>>>        [<     inline     >] __raw_spin_lock_irqsave 
>>>>>>> include/linux/spinlock_api_smp.h:112
>>>>>>>        [<ffffffff85c8e790>] _raw_spin_lock_irqsave+0x50/0x70 
>>>>>>> kernel/locking/spinlock.c:159
>>>>>>>        [<ffffffff82b8c050>] tty_get_pgrp+0x20/0x80 
>>>>>>> drivers/tty/tty_io.c:2502
>>>>>>
>>>>>> So in any recent code that I look at this function tries to acquire
>>>>>> tty->ctrl_lock, not buf->lock. Am I missing something ?!
>>>>>
>>>>> Yes.
>>>>>
>>>>> The tty locks were annotated with __lockfunc so were being elided from 
>>>>> lockdep
>>>>> stacktraces. Greg has a patch in his queue from me that removes the 
>>>>> __lockfunc
>>>>> annotation ("tty: Remove __lockfunc annotation from tty lock functions").
>>>>>
>>>>> Unfortunately, I think syzkaller's post-processing stack trace isn't 
>>>>> helping
>>>>> either, giving the impression that the stack is still inside 
>>>>> tty_get_pgrp().
>>>>>
>>>>> It's not.
>>>>
>>>> I've got a new report on commit
>>>> a200dcb34693084e56496960d855afdeaaf9578f  (Jan 18).
>>>> Here is unprocessed version:
>>>> https://gist.githubusercontent.com/dvyukov/428a0c9bfaa867d8ce84/raw/0754db31668602ad07947f9964238b2f9cf63315/gistfile1.txt
>>>> and here is processed one:
>>>> https://gist.githubusercontent.com/dvyukov/42b874213de82d94c35e/raw/2bbced252035821243678de0112e2ed3a766fb5d/gistfile1.txt
>>>>
>>>> Peter, what exactly is wrong with the post-processed version? I would
>>>> be interested in fixing the processing script.
>>>>
>>>> As far as I see it contains the same stacks just with line numbers and
>>>> inlined frames. I am using a significantly different compilation mode
>>>> (kasan + kcov + very recent gcc), so nobody except me won't be able to
>>>> figure out line numbers based on offsets.
>>>
>>> I'm not sure anything is wrong with the post-processing. My confusion
>>> (and Jiri) was that the stack trace lists
>>> tty_get_pgrp()->_raw_spin_lock_irqsave() but that function acquires
>>> tty->ctrl_lock, not as the report claims buf->lock.
>>
>> I think this is the other way around; that lockdep has graphed a circular
>> dependency chain, but that something is wrong with the stack traces.
>>
>>> PeterH says that lockdep omits functions tagged with __lockfunc, but my
>>> reading disagrees with that.
>>>
>>> kernel/locking/lockdep.c:save_trace()
>>>   save_stack_trace()
>>>     dump_trace(.ops = &save_stack_ops)
>>>
>>> which has: .address = save_stack_address
>>>   __save_stack_address(.nosched = false)
>>>
>>> So lockdep should very much include lock functions.
>>
>> Wild hypothesis on my part.
>>
>>> But this confusion is part of the original report, not because of the
>>> post-processing.
>>
>> Yeah, agreed. The original report is very odd.
> 
> 
> 
> 
> 
> 
> Peter H, I have not yet applied your patch, but got a slightly
> differently looking report (n_tracesink_datadrain wasn't mentioned
> before):

May or may not be related. And this report suffers from the same
deficiencies as the previous report which is that the saved
stack trace doesn't match the lock dependency.

Without understanding why the stack trace doesn't match the lock
dependency, any solution is just based on conjecture.

Regards,
Peter Hurley


> [ 1483.472724] ======================================================
> [ 1483.473033] [ INFO: possible circular locking dependency detected ]
> [ 1483.473033] 4.4.0+ #275 Not tainted
> [ 1483.473033] -------------------------------------------------------
> [ 1483.473033] kworker/u9:0/10808 is trying to acquire lock:
> [ 1483.473033]  (writelock){+.+...}, at: [<ffffffff82d1ee14>]
> n_tracesink_datadrain+0x24/0xc0
> [ 1483.473033]
> [ 1483.473033] but task is already holding lock:
> [ 1483.473033]  (routelock){+.+...}, at: [<ffffffff82d1e9b0>]
> n_tracerouter_receivebuf+0x20/0x40
> [ 1483.473033]
> [ 1483.473033] which lock already depends on the new lock.
> [ 1483.473033]
> [ 1483.473033]
> [ 1483.473033] the existing dependency chain (in reverse order) is:
> [ 1483.473033]
> -> #4 (routelock){+.+...}:
> [ 1483.473033]        [<ffffffff8145a28c>] lock_acquire+0x1dc/0x430
> [ 1483.473033]        [<ffffffff86328ef1>] mutex_lock_nested+0xb1/0xa50
> [ 1483.473033]        [<ffffffff82d1e9b0>] n_tracerouter_receivebuf+0x20/0x40
> [ 1483.473033]        [<ffffffff82d06484>] flush_to_ldisc+0x584/0x7f0
> [ 1483.473033]        [<ffffffff8139fd36>] process_one_work+0x796/0x1440
> [ 1483.473033]        [<ffffffff813a0abb>] worker_thread+0xdb/0xfc0
> [ 1483.473033]        [<ffffffff813b424f>] kthread+0x23f/0x2d0
> [ 1483.473033]        [<ffffffff86332f6f>] ret_from_fork+0x3f/0x70
> [ 1483.473033]
> -> #3 (&buf->lock){+.+...}:
> [ 1483.473033]        [<ffffffff8145a28c>] lock_acquire+0x1dc/0x430
> [ 1483.473033]        [<ffffffff8633255f>] _raw_spin_lock_irqsave+0x9f/0xd0
> [ 1483.473033]        [<ffffffff82ce4210>] tty_get_pgrp+0x20/0x80
> [ 1483.473033]        [<ffffffff82cf28da>] __isig+0x1a/0x50
> [ 1483.473033]        [<ffffffff82cf49ae>] isig+0xae/0x2c0
> [ 1483.473033]        [<ffffffff82cf7762>] n_tty_receive_signal_char+0x22/0xf0
> [ 1483.473033]        [<ffffffff82cfb332>]
> n_tty_receive_char_special+0x10d2/0x2b30
> [ 1483.473033]        [<ffffffff82cfe733>]
> n_tty_receive_buf_common+0x19a3/0x2400
> [ 1483.473033]        [<ffffffff82cff1c3>] n_tty_receive_buf2+0x33/0x40
> [ 1483.473033]        [<ffffffff82d062bf>] flush_to_ldisc+0x3bf/0x7f0
> [ 1483.473033]        [<ffffffff8139fd36>] process_one_work+0x796/0x1440
> [ 1483.473033]        [<ffffffff813a0abb>] worker_thread+0xdb/0xfc0
> [ 1483.473033]        [<ffffffff813b424f>] kthread+0x23f/0x2d0
> [ 1483.473033]        [<ffffffff86332f6f>] ret_from_fork+0x3f/0x70
> [ 1483.473033]
> -> #2 (&o_tty->termios_rwsem/1){++++..}:
> [ 1483.473033]        [<ffffffff8145a28c>] lock_acquire+0x1dc/0x430
> [ 1483.473033]        [<ffffffff8632e337>] down_read+0x47/0x60
> [ 1483.473033]        [<ffffffff82cfce1d>] 
> n_tty_receive_buf_common+0x8d/0x2400
> [ 1483.473033]        [<ffffffff82cff1c3>] n_tty_receive_buf2+0x33/0x40
> [ 1483.473033]        [<ffffffff82d062bf>] flush_to_ldisc+0x3bf/0x7f0
> [ 1483.473033]        [<ffffffff8139fd36>] process_one_work+0x796/0x1440
> [ 1483.473033]        [<ffffffff813a0abb>] worker_thread+0xdb/0xfc0
> [ 1483.473033]        [<ffffffff813b424f>] kthread+0x23f/0x2d0
> [ 1483.473033]        [<ffffffff86332f6f>] ret_from_fork+0x3f/0x70
> [ 1483.473033]
> -> #1 (&port->buf.lock/1){+.+...}:
> [ 1483.473033]        [<ffffffff8145a28c>] lock_acquire+0x1dc/0x430
> [ 1483.473033]        [<ffffffff86328ef1>] mutex_lock_nested+0xb1/0xa50
> [ 1483.473033]        [<ffffffff82d1ef53>] n_tracesink_open+0x23/0xf0
> [ 1483.473033]        [<ffffffff82d03978>] tty_ldisc_open.isra.2+0x78/0xd0
> [ 1483.473033]        [<ffffffff82d03f62>] tty_set_ldisc+0x292/0x8a0
> [ 1483.473033]        [<ffffffff82cecbbe>] tty_ioctl+0xb2e/0x2160
> [ 1483.473033]        [<ffffffff817ebfbc>] do_vfs_ioctl+0x18c/0xfb0
> [ 1483.473033]        [<ffffffff817ece6f>] SyS_ioctl+0x8f/0xc0
> [ 1483.473033]        [<ffffffff86332bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
> [ 1483.473033]
> -> #0 (writelock){+.+...}:
> [ 1483.473033]        [<ffffffff8145692b>] __lock_acquire+0x31eb/0x4700
> [ 1483.473033]        [<ffffffff8145a28c>] lock_acquire+0x1dc/0x430
> [ 1483.473033]        [<ffffffff86328ef1>] mutex_lock_nested+0xb1/0xa50
> [ 1483.473033]        [<ffffffff82d1ee14>] n_tracesink_datadrain+0x24/0xc0
> [ 1483.473033]        [<ffffffff82d1e9bb>] n_tracerouter_receivebuf+0x2b/0x40
> [ 1483.473033]        [<ffffffff82d06484>] flush_to_ldisc+0x584/0x7f0
> [ 1483.473033]        [<ffffffff8139fd36>] process_one_work+0x796/0x1440
> [ 1483.473033]        [<ffffffff813a0abb>] worker_thread+0xdb/0xfc0
> [ 1483.473033]        [<ffffffff813b424f>] kthread+0x23f/0x2d0
> [ 1483.473033]        [<ffffffff86332f6f>] ret_from_fork+0x3f/0x70
> [ 1483.473033]
> [ 1483.473033] other info that might help us debug this:
> [ 1483.473033]
> [ 1483.473033] Chain exists of:
>   writelock --> &buf->lock --> routelock
> 
> [ 1483.473033]  Possible unsafe locking scenario:
> [ 1483.473033]
> [ 1483.473033]        CPU0                    CPU1
> [ 1483.473033]        ----                    ----
> [ 1483.473033]   lock(routelock);
> [ 1483.473033]                                lock(&buf->lock);
> [ 1483.473033]                                lock(routelock);
> [ 1483.473033]   lock(writelock);
> [ 1483.473033]
> [ 1483.473033]  *** DEADLOCK ***
> [ 1483.473033]
> [ 1483.473033] 5 locks held by kworker/u9:0/10808:
> [ 1483.473033]  #0:  ("events_unbound"){.+.+.+}, at:
> [<ffffffff8139fc3a>] process_one_work+0x69a/0x1440
> [ 1483.473033]  #1:  ((&buf->work)){+.+...}, at: [<ffffffff8139fc6a>]
> process_one_work+0x6ca/0x1440
> [ 1483.473033]  #2:  (&tty->ldisc_sem){++++++}, at:
> [<ffffffff82d0339b>] tty_ldisc_ref+0x1b/0x80
> [ 1483.473033]  #3:  (&buf->lock){+.+...}, at: [<ffffffff82d05fe1>]
> flush_to_ldisc+0xe1/0x7f0
> [ 1483.473033]  #4:  (routelock){+.+...}, at: [<ffffffff82d1e9b0>]
> n_tracerouter_receivebuf+0x20/0x40
> [ 1483.473033]
> [ 1483.473033] stack backtrace:
> [ 1483.473033] CPU: 1 PID: 10808 Comm: kworker/u9:0 Not tainted 4.4.0+ #275
> [ 1483.473033] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS Bochs 01/01/2011
> [ 1483.473033] Workqueue: events_unbound flush_to_ldisc
> [ 1483.473033]  00000000ffffffff ffff8800334ff7a0 ffffffff82994c8d
> ffffffff88faf110
> [ 1483.473033]  ffffffff88fdabb0 ffffffff88ff1ef0 ffff8800334ff7f0
> ffffffff8144fb58
> [ 1483.473033]  ffff88003a290000 ffff88003a2908ea 0000000000000000
> ffff88003a2908c8
> [ 1483.473033] Call Trace:
> [ 1483.473033]  [<ffffffff82994c8d>] dump_stack+0x6f/0xa2
> [ 1483.473033]  [<ffffffff8144fb58>] print_circular_bug+0x288/0x340
> [ 1483.473033]  [<ffffffff8145692b>] __lock_acquire+0x31eb/0x4700
> [ 1483.473033]  [<ffffffff81453740>] ? debug_check_no_locks_freed+0x3c0/0x3c0
> [ 1483.473033]  [<ffffffff81453740>] ? debug_check_no_locks_freed+0x3c0/0x3c0
> [ 1483.473033]  [<ffffffff81453740>] ? debug_check_no_locks_freed+0x3c0/0x3c0
> [ 1483.473033]  [<ffffffff8145a28c>] lock_acquire+0x1dc/0x430
> [ 1483.473033]  [<ffffffff82d1ee14>] ? n_tracesink_datadrain+0x24/0xc0
> [ 1483.473033]  [<ffffffff86328ef1>] mutex_lock_nested+0xb1/0xa50
> [ 1483.473033]  [<ffffffff82d1ee14>] ? n_tracesink_datadrain+0x24/0xc0
> [ 1483.473033]  [<ffffffff81452edd>] ? trace_hardirqs_on+0xd/0x10
> [ 1483.473033]  [<ffffffff82d1e9b0>] ? n_tracerouter_receivebuf+0x20/0x40
> [ 1483.473033]  [<ffffffff81452edd>] ? trace_hardirqs_on+0xd/0x10
> [ 1483.473033]  [<ffffffff86328e40>] ?
> mutex_lock_interruptible_nested+0xbe0/0xbe0
> [ 1483.473033]  [<ffffffff86328e40>] ?
> mutex_lock_interruptible_nested+0xbe0/0xbe0
> [ 1483.473033]  [<ffffffff86328e40>] ?
> mutex_lock_interruptible_nested+0xbe0/0xbe0
> [ 1483.473033]  [<ffffffff82d1ee14>] n_tracesink_datadrain+0x24/0xc0
> [ 1483.473033]  [<ffffffff82d1e990>] ? n_tracerouter_write+0x20/0x20
> [ 1483.473033]  [<ffffffff82d1e9bb>] n_tracerouter_receivebuf+0x2b/0x40
> [ 1483.473033]  [<ffffffff82d06484>] flush_to_ldisc+0x584/0x7f0
> [ 1483.473033]  [<ffffffff82d05f00>] ? tty_buffer_unlock_exclusive+0x120/0x120
> [ 1483.473033]  [<ffffffff8139fd36>] process_one_work+0x796/0x1440
> [ 1483.473033]  [<ffffffff8139fc6a>] ? process_one_work+0x6ca/0x1440
> [ 1483.473033]  [<ffffffff8139f5a0>] ? pwq_dec_nr_in_flight+0x2e0/0x2e0
> [ 1483.473033]  [<ffffffff813a0abb>] worker_thread+0xdb/0xfc0
> [ 1483.473033]  [<ffffffff813b424f>] kthread+0x23f/0x2d0
> [ 1483.473033]  [<ffffffff813a09e0>] ? process_one_work+0x1440/0x1440
> [ 1483.473033]  [<ffffffff813b4010>] ? kthread_create_on_node+0x3b0/0x3b0
> [ 1483.473033]  [<ffffffff813b4010>] ? kthread_create_on_node+0x3b0/0x3b0
> [ 1483.473033]  [<ffffffff86332f6f>] ret_from_fork+0x3f/0x70
> [ 1483.473033]  [<ffffffff813b4010>] ? kthread_create_on_node+0x3b0/0x3b0
> 

Reply via email to