(cc'ing Tetuso and quoting whole message)
Tetuso, could this be the same problem as the hang in wb_shutdown()
syszbot reported?
On Wed, Apr 25, 2018 at 05:07:48PM -0300, Fabiano Rosas wrote:
> I'm looking into an issue where removing a virtio disk via sysfs while another
> process is issuing write() calls results in the writing task going into a
> livelock:
>
>
> root@guest # cat test.sh
> #!/bin/bash
>
> dd if=/dev/zero of=/dev/vda bs=1M count=10000 &
> sleep 1
> echo 1 > /sys/bus/pci/devices/0000:00:04.0/remove
>
> root@guest # ls /dev/vd*
> /dev/vda
>
> root@guest # grep Dirty /proc/meminfo
> Dirty: 0 kB
>
> root@guest # sh test.sh
>
> root@guest # ps aux | grep "[d]d if"
> root 1699 38.6 0.0 111424 1216 hvc0 D+ 10:48 0:01 dd
> if=/dev/zero of=/dev/vda bs=1M count=10000
>
> root@guest # ls /dev/vd*
> ls: cannot access /dev/vd*: No such file or directory
>
> root@guest # cat /proc/1699/stack
> [<0>] 0xc0000000ffe28218
> [<0>] __switch_to+0x31c/0x480
> [<0>] balance_dirty_pages+0x990/0xb90
> [<0>] balance_dirty_pages_ratelimited+0x50c/0x6c0
> [<0>] generic_perform_write+0x1b0/0x260
> [<0>] __generic_file_write_iter+0x200/0x240
> [<0>] blkdev_write_iter+0xa4/0x150
> [<0>] __vfs_write+0x14c/0x240
> [<0>] vfs_write+0xd0/0x240
> [<0>] ksys_write+0x6c/0x110
> [<0>] system_call+0x58/0x6c
>
> root@guest # grep Dirty /proc/meminfo
> Dirty: 1506816 kB
>
> ---
>
> I have done some tracing and I believe this is caused by the clearing of
> 'WB_registered' in 'wb_shutdown':
>
> <snip>
> sh-1697 [000] .... 3994.541664: sysfs_remove_link <-del_gendisk
> sh-1697 [000] .... 3994.541671: wb_shutdown <-bdi_unregister
> <snip>
>
> Later, when 'balance_dirty_pages' tries to start writeback, it doesn't happen
> because 'WB_registered' is not set:
>
> fs/fs-writeback.c
>
> static void wb_wakeup(struct bdi_writeback *wb)
> {
> spin_lock_bh(&wb->work_lock);
> if (test_bit(WB_registered, &wb->state))
> mod_delayed_work(bdi_wq, &wb->dwork, 0);
> spin_unlock_bh(&wb->work_lock);
> }
>
> So we get stuck in a loop in 'balance_dirty_pages':
>
> root@guest # cat /sys/kernel/debug/tracing/set_ftrace_filter
> balance_dirty_pages_ratelimited
> balance_dirty_pages
> wb_wakeup
> wb_workfn
> io_schedule_timeout
>
> <snip>
> dd-1699 [000] .... 11192.535946: wb_wakeup <-balance_dirty_pages
> dd-1699 [000] .... 11192.535950: io_schedule_timeout <-balance_dirty_pages
> dd-1699 [000] .... 11192.745968: wb_wakeup <-balance_dirty_pages
> dd-1699 [000] .... 11192.745972: io_schedule_timeout <-balance_dirty_pages
> <snip>
>
>
> The test on 'WB_registered' before starting the writeback task is introduced
> by: "5acda9 bdi: avoid oops on device removal".
>
> I have made a *naive* attempt at fixing it by allowing writeback to happen
> even
> without 'WB_registered':
>
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index d4d04fe..050b067 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -982,7 +982,7 @@ void wb_start_background_writeback(struct bdi_writeback
> *wb)
> * writeback as soon as there is no other work to do.
> */
> trace_writeback_wake_background(wb);
> - wb_wakeup(wb);
> + mod_delayed_work(bdi_wq, &wb->dwork, 0);
> }
>
> /*
> @@ -1933,7 +1933,7 @@ void wb_workfn(struct work_struct *work)
> struct bdi_writeback, dwork);
> long pages_written;
>
> - set_worker_desc("flush-%s", dev_name(wb->bdi->dev));
> + set_worker_desc("flush-%s", wb->bdi->dev ? dev_name(wb->bdi->dev) : "?"
> );
> current->flags |= PF_SWAPWRITE;
>
> if (likely(!current_is_workqueue_rescuer() ||
> --
>
>
> The effect of that is that the 'dd' process now finishes successfully and we
> get "Buffer I/O error"s for the dirty pages that remain. I believe this to be
> in conformance with existing interfaces since dd does not issue any fsync()
> calls.
>
>
> Does my analysis make any sense and would something along these lines be
> acceptable as a solution?
>
>
> Cheers
>
--
tejun