On Wed, 2017-05-31 at 13:21 -0400, Dave Jones wrote:
> Just hit this during a trinity run.

925bb1ce47f429f69aad35876df7ecd8c53deb7e is the first bad commit
commit 925bb1ce47f429f69aad35876df7ecd8c53deb7e
Author: Vegard Nossum <vegard.nos...@oracle.com>
Date:   Thu May 11 12:18:52 2017 +0200

    tty: fix port buffer locking
    
    tty_insert_flip_string_fixed_flag() is racy against itself when called
    from the ioctl(TCXONC, TCION/TCIOFF) path [1] and the flush_to_ldisc()
    workqueue path [2].
    
    The problem is that port->buf.tail->used is modified without consistent
    locking; the ioctl path takes tty->atomic_write_lock, whereas the workqueue
    path takes ldata->output_lock.
    
    We cannot simply take ldata->output_lock, since that is specific to the
    N_TTY line discipline.
    
    It might seem natural to try to take port->buf.lock inside
    tty_insert_flip_string_fixed_flag() and friends (where port->buf is
    actually used/modified), but this creates problems for flush_to_ldisc()
    which takes it before grabbing tty->ldisc_sem, o_tty->termios_rwsem,
    and ldata->output_lock.
    
    Therefore, the simplest solution for now seems to be to take
    tty->atomic_write_lock inside tty_port_default_receive_buf(). This lock
    is also used in the write path [3] with a consistent ordering.
    
    [1]: Call Trace:
     tty_insert_flip_string_fixed_flag
     pty_write
     tty_send_xchar                     // down_read(&o_tty->termios_rwsem)
                                        // mutex_lock(&tty->atomic_write_lock)
     n_tty_ioctl_helper
     n_tty_ioctl
     tty_ioctl                          // down_read(&tty->ldisc_sem)
     do_vfs_ioctl
     SyS_ioctl
    
    [2]: Workqueue: events_unbound flush_to_ldisc
    Call Trace:
     tty_insert_flip_string_fixed_flag
     pty_write
     tty_put_char
     __process_echoes
     commit_echoes                      // mutex_lock(&ldata->output_lock)
     n_tty_receive_buf_common
     n_tty_receive_buf2
     tty_ldisc_receive_buf              // down_read(&o_tty->termios_rwsem)
     tty_port_default_receive_buf       // down_read(&tty->ldisc_sem)
     flush_to_ldisc                     // mutex_lock(&port->buf.lock)
     process_one_work
    
    [3]: Call Trace:
     tty_insert_flip_string_fixed_flag
     pty_write
     n_tty_write                        // mutex_lock(&ldata->output_lock)
                                        // down_read(&tty->termios_rwsem)
     do_tty_write (inline)              // mutex_lock(&tty->atomic_write_lock)
     tty_write                          // down_read(&tty->ldisc_sem)
     __vfs_write
     vfs_write
     SyS_write
    
    The bug can result in about a dozen different crashes depending on what
    exactly gets corrupted when port->buf.tail->used points outside the
    buffer.
    
    The patch passes my LOCKDEP/PROVE_LOCKING testing but more testing is
    always welcome.
    
    Found using syzkaller.
    
    Cc: <sta...@vger.kernel.org>
    Signed-off-by: Vegard Nossum <vegard.nos...@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

:040000 040000 6df9d7d2f22ff6ac1ad265869c1cb0d15621a0eb 
56e5e66a5e451bf55857025899f4a092b51dca6e M      drivers

git bisect start
# good: [a351e9b9fc24e982ec2f0e76379a49826036da12] Linux 4.11
git bisect good a351e9b9fc24e982ec2f0e76379a49826036da12
# bad: [104c08ba8e921ef97abfdc10408d54921a6d9003] Merge tag 'acpi-4.12-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect bad 104c08ba8e921ef97abfdc10408d54921a6d9003
# good: [16a12fa9aed176444fc795b09e796be41902bb08] Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
git bisect good 16a12fa9aed176444fc795b09e796be41902bb08
# good: [2d3e4866dea96b0506395b47bfefb234f2088dac] Merge tag 'for-linus' of 
git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect good 2d3e4866dea96b0506395b47bfefb234f2088dac
# good: [f94c128eefcce2e3448d543f13cd7d7b8aa660a5] Merge tag 'metag-for-v4.12' 
of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag
git bisect good f94c128eefcce2e3448d543f13cd7d7b8aa660a5
# good: [18365225f0440d09708ad9daade2ec11275c3df9] hwpoison, memcg: forcibly 
uncharge LRU pages
git bisect good 18365225f0440d09708ad9daade2ec11275c3df9
# good: [5bbecdbc8e7ffaaf47ac1f02014bf3bedda3fd11] nvme_fc: Support 
ctrl_loss_tmo
git bisect good 5bbecdbc8e7ffaaf47ac1f02014bf3bedda3fd11
# good: [d024baa58a4a7e5eb6058017771d15b9e47b56db] Merge branch 
'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good d024baa58a4a7e5eb6058017771d15b9e47b56db
# bad: [393bcfaeb8be7f46a4cd7d673e33541ebee76b12] Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
git bisect bad 393bcfaeb8be7f46a4cd7d673e33541ebee76b12
# bad: [b0f5a8f32e8bbdaae1abb8abe2d3cbafaba57e08] kthread: fix boot hang 
(regression) on MIPS/OpenRISC
git bisect bad b0f5a8f32e8bbdaae1abb8abe2d3cbafaba57e08
# good: [6f68a6ae1f0ea2fd87e26dc345866e928b5850a8] Merge tag 'powerpc-4.12-4' 
of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
git bisect good 6f68a6ae1f0ea2fd87e26dc345866e928b5850a8
# bad: [da4c279942b05727088774df224c0734688b4cbc] serial: enable serdev support
git bisect bad da4c279942b05727088774df224c0734688b4cbc
# good: [5c9d6abed9e0a061de252a53ab687a1171502e81] serial: altera_jtaguart: 
adding iounmap()
git bisect good 5c9d6abed9e0a061de252a53ab687a1171502e81
# bad: [925bb1ce47f429f69aad35876df7ecd8c53deb7e] tty: fix port buffer locking
git bisect bad 925bb1ce47f429f69aad35876df7ecd8c53deb7e
# good: [11d4d32158eeaf36fe1073a4a260193d7a19ccf1] tty: ehv_bytechan: clean up 
init error handling
git bisect good 11d4d32158eeaf36fe1073a4a260193d7a19ccf1
# first bad commit: [925bb1ce47f429f69aad35876df7ecd8c53deb7e] tty: fix port 
buffer locking

[   56.727594] ======================================================
[   56.728482] WARNING: possible circular locking dependency detected
[   56.729318] 4.12.0-master #108 Tainted: G            E  
[   56.730125] ------------------------------------------------------
[   56.731024] kworker/u8:3/329 is trying to acquire lock:
[   56.731774]  (&buf->lock){+.+...}, at: [<ffffffff81423e97>] 
tty_buffer_flush+0x37/0xa0
[   56.732773] 
               but task is already holding lock:
[   56.733444]  (&o_tty->termios_rwsem/1){++++..}, at: [<ffffffff8141fb91>] 
isig+0x41/0xf0
[   56.734303] 
               which lock already depends on the new lock.

[   56.734868] 
               the existing dependency chain (in reverse order) is:
[   56.735377] 
               -> #2 (&o_tty->termios_rwsem/1){++++..}:
[   56.735799]        lock_acquire+0xbd/0x200
[   56.736071]        down_read+0x4c/0x70
[   56.736323]        n_tty_write+0x90/0x450
[   56.736584]        tty_write+0x159/0x2e0
[   56.736844]        __vfs_write+0x28/0x140
[   56.737119]        vfs_write+0xc4/0x1c0
[   56.737376]        redirected_tty_write+0x5c/0xa0
[   56.737681]        __do_readv_writev+0x2b4/0x350
[   56.737980]        do_readv_writev+0x63/0x80
[   56.738263]        vfs_writev+0x3c/0x50
[   56.738513]        do_writev+0x55/0xd0
[   56.738758]        SyS_writev+0x10/0x20
[   56.739009]        entry_SYSCALL_64_fastpath+0x1f/0xbe
[   56.739345] 
               -> #1 (&tty->atomic_write_lock){+.+.+.}:
[   56.739760]        lock_acquire+0xbd/0x200
[   56.740026]        __mutex_lock+0x75/0x8e0
[   56.740297]        mutex_lock_nested+0x1b/0x20
[   56.740585]        tty_port_default_receive_buf+0x48/0x90
[   56.740932]        flush_to_ldisc+0x7f/0xa0
[   56.741233]        process_one_work+0x1c5/0x5c0
[   56.741528]        worker_thread+0x4d/0x3c0
[   56.741800]        kthread+0x114/0x150
[   56.742046]        ret_from_fork+0x2a/0x40
[   56.742318] 
               -> #0 (&buf->lock){+.+...}:
[   56.742664]        __lock_acquire+0x13e1/0x1400
[   56.742957]        lock_acquire+0xbd/0x200
[   56.743228]        __mutex_lock+0x75/0x8e0
[   56.743496]        mutex_lock_nested+0x1b/0x20
[   56.743784]        tty_buffer_flush+0x37/0xa0
[   56.744068]        pty_flush_buffer+0x27/0x80
[   56.744462]        tty_driver_flush_buffer+0x1b/0x20
[   56.744776]        isig+0x95/0xf0
[   56.744999]        n_tty_receive_signal_char+0x1c/0x70
[   56.745338]        n_tty_receive_char_special+0x863/0xb00
[   56.745660]        n_tty_receive_buf_common+0x7ae/0xac0
[   56.746013]        n_tty_receive_buf2+0x14/0x20
[   56.746311]        tty_ldisc_receive_buf+0x23/0x50
[   56.746598]        tty_port_default_receive_buf+0x59/0x90
[   56.746960]        flush_to_ldisc+0x7f/0xa0
[   56.747237]        process_one_work+0x1c5/0x5c0
[   56.747509]        worker_thread+0x4d/0x3c0
[   56.747761]        kthread+0x114/0x150
[   56.748016]        ret_from_fork+0x2a/0x40
[   56.748300] 
               other info that might help us debug this:

[   56.748835] Chain exists of:
                 &buf->lock --> &tty->atomic_write_lock --> 
&o_tty->termios_rwsem/1

[   56.749581]  Possible unsafe locking scenario:

[   56.749994]        CPU0                    CPU1
[   56.750294]        ----                    ----
[   56.750564]   lock(&o_tty->termios_rwsem/1);
[   56.750855]                                lock(&tty->atomic_write_lock);
[   56.751314]                                lock(&o_tty->termios_rwsem/1);
[   56.751808]   lock(&buf->lock);
[   56.752021] 
                *** DEADLOCK ***

[   56.752380] 6 locks held by kworker/u8:3/329:
[   56.752641]  #0:  ("events_unbound"){.+.+.+}, at: [<ffffffff8108861f>] 
process_one_work+0x13f/0x5c0
[   56.753203]  #1:  ((&buf->work)){+.+...}, at: [<ffffffff8108861f>] 
process_one_work+0x13f/0x5c0
[   56.753717]  #2:  (&port->buf.lock/1){+.+...}, at: [<ffffffff81423935>] 
flush_to_ldisc+0x25/0xa0
[   56.754243]  #3:  (&tty->ldisc_sem){++++.+}, at: [<ffffffff81422ccf>] 
tty_ldisc_ref+0x1f/0x60
[   56.754748]  #4:  (&tty->atomic_write_lock){+.+.+.}, at: 
[<ffffffff814242b8>] tty_port_default_receive_buf+0x48/0x90
[   56.755373]  #5:  (&o_tty->termios_rwsem/1){++++..}, at: 
[<ffffffff8141fb91>] isig+0x41/0xf0
[   56.755875] 
               stack backtrace:
[   56.756139] CPU: 2 PID: 329 Comm: kworker/u8:3 Tainted: G            E   
4.12.0-master #108
[   56.756637] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[   56.757327] Workqueue: events_unbound flush_to_ldisc
[   56.757629] Call Trace:
[   56.757830]  dump_stack+0x85/0xc9
[   56.758048]  print_circular_bug+0x1f9/0x207
[   56.758315]  __lock_acquire+0x13e1/0x1400
[   56.758564]  ? check_irq_usage+0x83/0xb0
[   56.758882]  lock_acquire+0xbd/0x200
[   56.759113]  ? tty_buffer_flush+0x37/0xa0
[   56.759463]  ? tty_buffer_flush+0x37/0xa0
[   56.759737]  __mutex_lock+0x75/0x8e0
[   56.759998]  ? tty_buffer_flush+0x37/0xa0
[   56.760250]  ? isig+0x5e/0xf0
[   56.760433]  ? tty_buffer_flush+0x37/0xa0
[   56.760677]  ? find_held_lock+0x35/0xa0
[   56.760957]  ? isig+0x8d/0xf0
[   56.761140]  mutex_lock_nested+0x1b/0x20
[   56.761383]  tty_buffer_flush+0x37/0xa0
[   56.761617]  pty_flush_buffer+0x27/0x80
[   56.761870]  tty_driver_flush_buffer+0x1b/0x20
[   56.762162]  isig+0x95/0xf0
[   56.762340]  n_tty_receive_signal_char+0x1c/0x70
[   56.762619]  n_tty_receive_char_special+0x863/0xb00
[   56.762960]  n_tty_receive_buf_common+0x7ae/0xac0
[   56.763250]  ? tty_ldisc_ref+0x1f/0x60
[   56.763478]  n_tty_receive_buf2+0x14/0x20
[   56.763721]  tty_ldisc_receive_buf+0x23/0x50
[   56.763983]  ? mutex_lock_nested+0x1b/0x20
[   56.764238]  tty_port_default_receive_buf+0x59/0x90
[   56.764533]  flush_to_ldisc+0x7f/0xa0
[   56.764756]  process_one_work+0x1c5/0x5c0
[   56.765075]  ? process_one_work+0x13f/0x5c0
[   56.765366]  worker_thread+0x4d/0x3c0
[   56.765590]  kthread+0x114/0x150
[   56.765787]  ? process_one_work+0x5c0/0x5c0
[   56.766094]  ? kthread_create_on_node+0x40/0x40
[   56.766374]  ret_from_fork+0x2a/0x40

Reply via email to