Tested against v4.13-rc7. With this patchset it looks like I/O doesn't hang,
but once (just once, not each time) I've got the following stacktrace on
resume:
===
[ 55.577173] ata1.00: Security Log not supported
[ 55.580690] ata1.00: configured for UDMA/100
[ 55.582257] ------------[ cut here ]------------
[ 55.583924] usb 1-1: reset high-speed USB device number 2 using xhci_hcd
[ 55.587489] WARNING: CPU: 3 PID: 646 at lib/percpu-refcount.c:361
percpu_ref_reinit+0x21/0x80
[ 55.590073] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat iTCO_wdt
kvm_intel bochs_drm ppdev kvm ttm iTCO_vendor_support drm_kms_helper irqbypass
8139too input_leds drm evdev psmouse led_class pcspkr syscopyarea joydev
sysfillrect lpc_ich 8139cp parport_pc sysimgblt mousedev intel_agp i2c_i801
fb_sys_fops mii mac_hid intel_gtt parport qemu_fw_cfg button sch_fq_codel
ip_tables x_tables xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio
libcrc32c crc32c_generic algif_skcipher af_alg dm_crypt dm_mod dax raid10
md_mod sr_mod cdrom sd_mod hid_generic usbhid hid uhci_hcd crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_rng ahci xhci_pci
serio_raw pcbc ehci_pci xhci_hcd rng_core atkbd libps2 libahci ehci_hcd libata
aesni_intel aes_x86_64 crypto_simd glue_helper cryptd
[ 55.611580] usbcore virtio_pci scsi_mod usb_common virtio_ring virtio
i8042 serio
[ 55.614305] CPU: 3 PID: 646 Comm: kworker/u8:23 Not tainted 4.13.0-pf1 #1
[ 55.616611] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0
02/06/2015
[ 55.619903] Workqueue: events_unbound async_run_entry_fn
[ 55.621888] task: ffff88001b271e00 task.stack: ffffc90000a2c000
[ 55.623674] RIP: 0010:percpu_ref_reinit+0x21/0x80
[ 55.625751] RSP: 0000:ffffc90000a2fdb0 EFLAGS: 00010002
[ 55.628687] RAX: 0000000000000002 RBX: ffff88001dd80768 RCX:
ffff88001dd80758
[ 55.631475] RDX: 0000000000000001 RSI: 0000000000000212 RDI:
ffffffff81f3e2f0
[ 55.633694] RBP: ffffc90000a2fdc0 R08: 0000000cc61e7800 R09:
ffff88001f9929c0
[ 55.637144] R10: ffffffffffec3296 R11: 7fffffffffffffff R12:
0000000000000246
[ 55.642456] R13: ffff88001f410800 R14: ffff88001f414300 R15:
0000000000000000
[ 55.644832] FS: 0000000000000000(0000) GS:ffff88001f980000(0000) knlGS:
0000000000000000
[ 55.647388] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.649608] CR2: 00000000ffffffff CR3: 000000001aa50000 CR4:
00000000001406e0
[ 55.652688] Call Trace:
[ 55.654597] blk_unfreeze_queue+0x2f/0x50
[ 55.656794] scsi_device_resume+0x28/0x70 [scsi_mod]
[ 55.659059] scsi_dev_type_resume+0x38/0x90 [scsi_mod]
[ 55.660875] async_sdev_resume+0x15/0x20 [scsi_mod]
[ 55.662564] async_run_entry_fn+0x36/0x150
[ 55.664241] process_one_work+0x1de/0x430
[ 55.666018] worker_thread+0x47/0x3f0
[ 55.667387] kthread+0x125/0x140
[ 55.672740] ? process_one_work+0x430/0x430
[ 55.674971] ? kthread_create_on_node+0x70/0x70
[ 55.677110] ret_from_fork+0x25/0x30
[ 55.679098] Code: 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89
fb 48 c7 c7 f0 e2 f3 81 e8 0a de 32 00 49 89 c4 48 8b 43 08 a8 03 75 42 <0f>
ff 48 83 63 08 fd 65 ff 05 31 7d cc 7e 48 8b 53 08 f6 c2 03
[ 55.684822] ---[ end trace dbbf5aed3cf35331 ]---
[ 55.714306] PM: resume of devices complete after 500.175 msecs
[ 55.717299] OOM killer enabled.
===
Here:
===
355 void percpu_ref_reinit(struct percpu_ref *ref)
356 {
357 unsigned long flags;
358
359 spin_lock_irqsave(&percpu_ref_switch_lock, flags);
360
361 WARN_ON_ONCE(!percpu_ref_is_zero(ref)); // <--
362
363 ref->percpu_count_ptr &= ~__PERCPU_REF_DEAD;
364 percpu_ref_get(ref);
365 __percpu_ref_switch_mode(ref, NULL);
366
367 spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);
368 }
===
On Ätvrtek 31. srpna 2017 19:38:34 CEST Ming Lei wrote:
> On Thu, Aug 31, 2017 at 07:34:06PM +0200, Oleksandr Natalenko wrote:
> > Since I'm in CC, does this series aim to replace 2 patches I've tested
> > before:
> >
> > blk-mq: add requests in the tail of hctx->dispatch
> > blk-mq: align to legacy's implementation of blk_execute_rq
> >
> > ?
>
> Yeah, this solution is more generic, and the old one in above
> two patches may run into the same deadlock inevitably.
>
> Oleksandr, could you test this patchset and provide the feedback?
>
> BTW, it fixes the I/O hang in my raid10 test, but I just write
> 'devices' to pm_test.
>
> Thanks!