------- Comment From mdr...@us.ibm.com 2018-05-04 09:09 EDT------- (In reply to comment #15) > This is not the same as the original bug, but I suspect they are part of a > class of issues we're hitting while running under very particular > circumstances which might not generally be seen during normal operation and > triggering various corner cases. As such I think it makes sense to group > them under this bug for the time being. > > The general workload is running IO-heavy disk workloads on large guests > (20GB memory, 16 vcpus) with SAN-based storage, and then performing > migration during the workload. During migration we begin to see a high > occurrence of rcu_sched stall warnings, and after 1-3 hours of operations > we hit filesystem-related crashes like the ones posted. We've seen this with > 2 separate FC cards, emulex and qlogic, where we invoke QEMU through libvirt > as:
We been gathering additional traces while running under this scenario, and while so far most of the traces have been filesystem-related, we now have a couple that suggest the common thread between all of these is failures are related to RCU-managed data structures. I'll attach the summaries for these from xmon, these have the full dmesg log since guest start, as well as timestamps in dmesg noting where migrating has started/stopped, and "WATCHDOG" messages to note any large jumps in wall-clock time. For example (from boslcp3g1-migtest-fail-on-lcp5): [ 5757.347542] migration iteration 7: started at Thu May 3 05:59:14 CDT 2018 <guest stopped will get stopped by libvirt 60 seconds, then after migration finishes it will be resumed and we immediately see the below> [ 5935.727884] INFO: rcu_sched detected stalls on CPUs/tasks: [ 5935.728567] 1-...!: (670 GPs behind) idle=486/140000000000000/0 softirq=218179/218180 fqs=0 [ 5935.730091] 2-...!: (3750 GPs behind) idle=006/140000000000000/0 softirq=203335/203335 fqs=0 [ 5935.731076] 4-...!: (96 GPs behind) idle=c2e/140000000000000/0 softirq=168607/168608 fqs=0 [ 5935.731783] 5-...!: (2270 GPs behind) idle=e16/140000000000000/0 softirq=152608/152608 fqs=1 [ 5935.732959] 6-...!: (322 GPs behind) idle=3ca/140000000000001/0 softirq=169452/169453 fqs=1 [ 5935.735061] 8-...!: (6 GPs behind) idle=c36/140000000000001/0 softirq=280514/280516 fqs=1 [ 5935.736638] 9-...!: (5 GPs behind) idle=c1e/140000000000001/0 softirq=248247/248249 fqs=1 [ 5935.738112] 10-...!: (4 GPs behind) idle=62a/1/0 softirq=228207/228208 fqs=1 [ 5935.738868] 11-...!: (32 GPs behind) idle=afe/140000000000000/0 softirq=228817/228818 fqs=1 [ 5935.739122] 12-...!: (3 GPs behind) idle=426/1/0 softirq=192716/192717 fqs=1 [ 5935.739295] 14-...!: (5 GPs behind) idle=e56/140000000000000/0 softirq=133888/133892 fqs=1 [ 5935.739486] 15-...!: (7 GPs behind) idle=36e/140000000000000/0 softirq=161010/161013 fqs=1 ... [ 5935.740031] Unable to handle kernel paging request for data at address 0x00000008 [ 5935.740128] Faulting instruction address: 0xc000000000403d04 For the prior iterations where we don't crash we'd have messages like: [ 2997.413561] WATCHDOG (Thu May 3 05:13:18 CDT 2018): time jump of 114 seconds [ 3023.759629] migration iteration 1: completed at Thu May 3 05:13:25 CDT 2018 [ 3239.678964] migration iteration 2: started at Thu May 3 05:16:45 CDT 2018 The WATCHDOG is noting the amount of time the guest has seen jump after it resumes execution. These are generally on the order of 1-2 minutes here where we're doing migration via virsh migrate ... --timeout 60, which manually stops the guest if it hasn't finished migration within 60s. We now know that the source of the skip in time actually originates from behavior on the source side of migration due to handling within QEMU, and the guest is reacting after it wakes up from migration. A patch has been sent which changes the behavior so that the guest doesn't see a jump in time after resuming: http://lists.nongnu.org/archive/html/qemu-devel/2018-05/msg00928.html The patch is still under discussion and it's not clear yet whether this is actually a QEMU bug or intended behavior. I'm still testing the bug is conjunciton with original workload and would like to see it run over the weekend before I can say with any certain, but so far it has run overnight whereas prior to the change it would crashes after an hour or 2 (though we have seen runs that survived as long as 8 hours so that's not definitive). If that survives it would suggest that the source for the RCU-related crashes seems to occur as a result of a jump in the guest VCPU's timebase register. One interesting thing I've noticed is that with a QEMU that *doesn't have the patch above*, disabling RCU stall warning messages via: echo 1 >/sys/module/rcupdate/parameters/rcu_cpu_stall_suppress allowed the workload to run for 16 hours without crashing. This may suggest the warning messages, in conjunction with rcu_cpu_stall_timeout being exceeded due to jump in timebase register, are triggering issues with RCU. What I plan to try next is raising rcu_cpu_stall_timeout to a much higher value (currently 21 on Ubuntu 18.04 it seems) and see if that can survive a run over the weekend. All this also suggests it may be possible to trigger the failures synthetically by doing a high IO workload, pausing the guest, advancing the VCPU tb_offset, and then resuming. This is something I plan to try soon to help narrow things down. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1768115 Title: ISST-LTE:KVM:Ubuntu1804:BostonLC:boslcp3g1: Migration guest running with IO stress crashed@security_file_permission+0xf4/0x160. Status in The Ubuntu-power-systems project: Triaged Status in linux package in Ubuntu: New Bug description: Problem Description: Migration Guest running with IO stress crashed@security_file_permission+0xf4/0x160 after couple of migrations. Steps to re-create: Source host - boslcp3 Destination host - boslcp4 1.boslcp3 & boslcp4 installed with latest kernel root@boslcp3:~# uname -a Linux boslcp3 4.15.0-20-generic #21+bug166588 SMP Thu Apr 26 15:05:59 CDT 2018 ppc64le ppc64le ppc64le GNU/Linux root@boslcp3:~# root@boslcp4:~# uname -a Linux boslcp4 4.15.0-20-generic #21+bug166588 SMP Thu Apr 26 15:05:59 CDT 2018 ppc64le ppc64le ppc64le GNU/Linux root@boslcp3:~# 2. Installed guest boslcp3g1 with kernel and started LTP run from boslcp3 host root@boslcp3g1:~# uname -a Linux boslcp3g1 4.15.0-15-generic #16+bug166877 SMP Wed Apr 18 14:47:30 CDT 2018 ppc64le ppc64le ppc64le GNU/Linux 3. Started migrating boslcp3g1 guest from source to destination & viceversa. 4. After couple of migrations it crashed at boslcp4 & enters into xmon 8:mon> t [c0000004f8a23d20] c0000000005a7674 security_file_permission+0xf4/0x160 [c0000004f8a23d60] c0000000003d1d30 rw_verify_area+0x70/0x120 [c0000004f8a23d90] c0000000003d375c vfs_read+0x8c/0x1b0 [c0000004f8a23de0] c0000000003d3d88 SyS_read+0x68/0x110 [c0000004f8a23e30] c00000000000b184 system_call+0x58/0x6c --- Exception: c01 (System Call) at 000071f1779fe280 SP (7fffe99ece50) is in userspace 8:mon> S msr = 8000000000001033 sprg0 = 0000000000000000 pvr = 00000000004e1202 sprg1 = c000000007a85800 dec = 00000000591e3e03 sprg2 = c000000007a85800 sp = c0000004f8a234a0 sprg3 = 0000000000010008 toc = c0000000016eae00 dar = 000000000000023c srr0 = c0000000000c355c srr1 = 8000000000001033 dsisr = 40000000 dscr = 0000000000000000 ppr = 0010000000000000 pir = 00000011 amr = 0000000000000000 uamor = 0000000000000000 dpdes = 0000000000000000 tir = 0000000000000000 cir = 00000000 fscr = 0500000000000180 tar = 0000000000000000 pspb = 00000000 mmcr0 = 0000000080000000 mmcr1 = 0000000000000000 mmcr2 = 0000000000000000 pmc1 = 00000000 pmc2 = 00000000 pmc3 = 00000000 pmc4 = 00000000 mmcra = 0000000000000000 siar = 0000000000000000 pmc5 = 0000026c sdar = 0000000000000000 sier = 0000000000000000 pmc6 = 00000861 ebbhr = 0000000000000000 ebbrr = 0000000000000000 bescr = 0000000000000000 iamr = 4000000000000000 pidr = 0000000000000034 tidr = 0000000000000000 cpu 0x8: Vector: 700 (Program Check) at [c0000004f8a23220] pc: c0000000000e4854: xmon_core+0x1f24/0x3520 lr: c0000000000e4850: xmon_core+0x1f20/0x3520 sp: c0000004f8a234a0 msr: 8000000000041033 current = 0xc0000004f89faf00 paca = 0xc000000007a85800 softe: 0 irq_happened: 0x01 pid = 24028, comm = top Linux version 4.15.0-20-generic (buildd@bos02-ppc64el-002) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #21-Ubuntu SMP Tue Apr 24 06:14:44 UTC 2018 (Ubuntu 4.15.0-20.21-generic 4.15.17) cpu 0x8: Exception 700 (Program Check) in xmon, returning to main loop [c0000004f8a23d20] c0000000005a7674 security_file_permission+0xf4/0x160 [c0000004f8a23d60] c0000000003d1d30 rw_verify_area+0x70/0x120 [c0000004f8a23d90] c0000000003d375c vfs_read+0x8c/0x1b0 [c0000004f8a23de0] c0000000003d3d88 SyS_read+0x68/0x110 [c0000004f8a23e30] c00000000000b184 system_call+0x58/0x6c --- Exception: c01 (System Call) at 000071f1779fe280 SP (7fffe99ece50) is in userspace 8:mon> r R00 = c00000000043b7fc R16 = 0000000000000000 R01 = c0000004f8a23c90 R17 = ffffffffffffff70 R02 = c0000000016eae00 R18 = 00000a51b4bebfc8 R03 = c000000279557200 R19 = 00007fffe99edbb0 R04 = c0000003242499c0 R20 = 00000a51b4c04db0 R05 = 0000000000020000 R21 = 00000a51b4c20e90 R06 = 0000000000000004 R22 = 0000000000040f00 R07 = ffffff8100000000 R23 = 00000a51b4c06560 R08 = ffffff8000000000 R24 = ffffffffffffff80 R09 = 0000000000000000 R25 = 00000a51b4bec2b8 R10 = 0000000000000000 R26 = 000071f177bb0b20 R11 = 0000000000000000 R27 = 0000000000000000 R12 = 0000000000002000 R28 = c000000279557200 R13 = c000000007a85800 R29 = c0000004c7734210 R14 = 0000000000000000 R30 = 0000000000000000 R15 = 0000000000000000 R31 = c0000003242499c0 pc = c00000000043b808 __fsnotify_parent+0x88/0x1a0 cfar= c0000000003f9e78 dget_parent+0xe8/0x150 lr = c00000000043b7fc __fsnotify_parent+0x7c/0x1a0 msr = 8000000000009033 cr = 28002222 ctr = c0000000006252b0 xer = 0000000000000000 trap = 300 dar = 000000000000023c dsisr = 40000000 8:mon> e cpu 0x8: Vector: 300 (Data Access) at [c0000004f8a23a10] pc: c00000000043b808: __fsnotify_parent+0x88/0x1a0 lr: c00000000043b7fc: __fsnotify_parent+0x7c/0x1a0 sp: c0000004f8a23c90 msr: 8000000000009033 dar: 23c dsisr: 40000000 current = 0xc0000004f89faf00 paca = 0xc000000007a85800 softe: 0 irq_happened: 0x01 pid = 24028, comm = top Linux version 4.15.0-20-generic (buildd@bos02-ppc64el-002) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #21-Ubuntu SMP Tue Apr 24 06:14:44 UTC 2018 (Ubuntu 4.15.0-20.21-generic 4.15.17) 6. Guest enters into xmon after migrating from boslcp3 to boslcp4. > > 8:mon> t > [c0000004f8a23d20] c0000000005a7674 security_file_permission+0xf4/0x160 > [c0000004f8a23d60] c0000000003d1d30 rw_verify_area+0x70/0x120 > [c0000004f8a23d90] c0000000003d375c vfs_read+0x8c/0x1b0 > [c0000004f8a23de0] c0000000003d3d88 SyS_read+0x68/0x110 > [c0000004f8a23e30] c00000000000b184 system_call+0x58/0x6c > --- Exception: c01 (System Call) at 000071f1779fe280 > SP (7fffe99ece50) is in userspace > 8:mon> r > R00 = c00000000043b7fc R16 = 0000000000000000 > R01 = c0000004f8a23c90 R17 = ffffffffffffff70 > R02 = c0000000016eae00 R18 = 00000a51b4bebfc8 > R03 = c000000279557200 R19 = 00007fffe99edbb0 > R04 = c0000003242499c0 R20 = 00000a51b4c04db0 > R05 = 0000000000020000 R21 = 00000a51b4c20e90 > R06 = 0000000000000004 R22 = 0000000000040f00 > R07 = ffffff8100000000 R23 = 00000a51b4c06560 > R08 = ffffff8000000000 R24 = ffffffffffffff80 > R09 = 0000000000000000 R25 = 00000a51b4bec2b8 > R10 = 0000000000000000 R26 = 000071f177bb0b20 > R11 = 0000000000000000 R27 = 0000000000000000 > R12 = 0000000000002000 R28 = c000000279557200 > R13 = c000000007a85800 R29 = c0000004c7734210 > R14 = 0000000000000000 R30 = 0000000000000000 > R15 = 0000000000000000 R31 = c0000003242499c0 > pc = c00000000043b808 __fsnotify_parent+0x88/0x1a0 > cfar= c0000000003f9e78 dget_parent+0xe8/0x150 > lr = c00000000043b7fc __fsnotify_parent+0x7c/0x1a0 > msr = 8000000000009033 cr = 28002222 > ctr = c0000000006252b0 xer = 0000000000000000 trap = 300 > dar = 000000000000023c dsisr = 40000000 > BUG_ON in jbd2_journal_write_metadata_buffer I've included xmon crash data from a more recent crash, this time a BUG_ON in fs/jbd2/journal.c:jbd2_journal_write_metadata_buffer(): int jbd2_journal_write_metadata_buffer(transaction_t *transaction, struct journal_head *jh_in, struct buffer_head **bh_out, sector_t blocknr) { int need_copy_out = 0; int done_copy_out = 0; int do_escape = 0; char *mapped_data; struct buffer_head *new_bh; struct page *new_page; unsigned int new_offset; struct buffer_head *bh_in = jh2bh(jh_in); journal_t *journal = transaction->t_journal; /* * The buffer really shouldn't be locked: only the current committing * transaction is allowed to write it, so nobody else is allowed * to do any IO. * * akpm: except if we're journalling data, and write() output is * also part of a shared mapping, and another thread has * decided to launch a writepage() against this buffer. */ J_ASSERT_BH(bh_in, buffer_jbddirty(bh_in)); This is not the same as the original bug, but I suspect they are part of a class of issues we're hitting while running under very particular circumstances which might not generally be seen during normal operation and triggering various corner cases. As such I think it makes sense to group them under this bug for the time being. The general workload is running IO-heavy disk workloads on large guests (20GB memory, 16 vcpus) with SAN-based storage, and then performing migration during the workload. During migration we begin to see a high occurrence of rcu_sched stall warnings, and after 1-3 hours of operations we hit filesystem-related crashes like the ones posted. We've seen this with 2 separate FC cards, emulex and qlogic, where we invoke QEMU through libvirt as: C_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-ppc64 -name guest=boslcp3g1 ,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-7-boslcp3g1 /master-key.aes -machine pseries-2.10,accel=kvm,usb=off,dump-guest- core=off,max-cpu-compat=power9 -cpu host -m 20480 -realtime mlock=off -smp 16,maxcpus=32,sockets=4,cores=8,threads=1 -object memory-backend- file,id=ram-node0,prealloc=yes,mem- path=/dev/hugepages/libvirt/qemu/7-boslcp3g1,size=10737418240 -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 -object memory-backend-ram,id =ram-node1,size=10737418240 -numa node,nodeid=1,cpus=8-15,memdev=ram- node1 -uuid bd110ed9-dcfc-4470-b4ae-6166a56819f0 -display none -no- user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-7-boslcp3g1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no- shutdown -boot menu=on,strict=on -device spapr-pci-host- bridge,index=1,id=pci.1 -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -drive file=/home /bionic-server-ppc64el.iso,format=raw,if=none,id=drive- scsi0-0-0-2,readonly=on,cache=none -device scsi- cd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive- scsi0-0-0-2,id=scsi0-0-0-2 -drive file=/dev/disk/by-id/dm-uuid-part1 -mpath-3600507680183050d28000000000002a4,format=raw,if=none,id=drive- virtio-disk0,cache=none -device virtio-blk- pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio- disk0,bootindex=1 -drive file=/dev/disk/by-id/dm-uuid-part2-mpath- 3600507680183050d28000000000002a4,format=raw,if=none,id=drive-virtio- disk1,cache=none -device virtio-blk- pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio- disk1 -drive file=/dev/disk/by-id/dm-uuid-part3-mpath- 3600507680183050d28000000000002a4,format=raw,if=none,id=drive-virtio- disk2,cache=none -device virtio-blk- pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk2,id=virtio- disk2 -netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=30 -device virtio-net- pci,netdev=hostnet0,id=net0,mac=52:54:00:72:d2:69,bus=pci.0,addr=0x1,bootindex=2 -chardev pty,id=charserial0 -device spapr- vty,chardev=charserial0,id=serial0,reg=0x30000000 -device virtio- balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on I will attach the libvirt XML separately IBM is requesting some general filesystem skills from Canonical if they have some as we continue debugging... To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1768115/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp