[Kernel-packages] [Bug 1659111] Re: UbuntuKVM guest crashed while running I/O stress test with Ubuntu kernel 4.4.0-47-generic

Kleber Sacilotto de Souza Wed, 26 Jul 2017 00:32:13 -0700

** Changed in: linux (Ubuntu Yakkety)
       Status: In Progress => Won't Fix


-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1659111

Title:
  UbuntuKVM guest crashed while running I/O stress test with Ubuntu
  kernel  4.4.0-47-generic

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Yakkety:
  Won't Fix
Status in linux source package in Zesty:
  Incomplete

Bug description:
  Attn. Canonical: For your awareness only at this time.

  == Comment: #0 - LEKSHMI C. PILLAI  - 2016-11-22 03:49:38 ==

  Machine INFO

  KVM HOST: luckyv1

  Guest :lucky05

  lucky05 crashed while running the I/O stress test for SAN disks.

  Installed lucky05 and enabled the xmon on that.After that started the
  RAW disk test on around 50 disks.After 6-7 hours after running,Now
  machine dropped into xmon.

  Logs:
  [25023.224182] Unable to handle kernel paging request for data at address 
0x00000000
  [25023.224257] Faulting instruction address: 0xc000000000324c60
  cpu 0x3: Vector: 300 (Data Access) at [c0000000fffc3620]
      pc: c000000000324c60: locked_inode_to_wb_and_lock_list+0x50/0x290
      lr: c00000000032831c: writeback_sb_inodes+0x30c/0x590
      sp: c0000000fffc38a0
     msr: 8000000100009033
     dar: 0
   dsisr: 40000000
    current = 0xc0000000ff99e470
    paca    = 0xc00000000fb41c80   softe: 0        irq_happened: 0x01
      pid   = 14736, comm = kworker/u16:8
  enter ? for help
  [c0000000fffc3900] c00000000032831c writeback_sb_inodes+0x30c/0x590
  [c0000000fffc3a10] c000000000328684 __writeback_inodes_wb+0xe4/0x150
  [c0000000fffc3a70] c000000000328aec wb_writeback+0x30c/0x450
  [c0000000fffc3b40] c0000000003296b4 wb_workfn+0x264/0x570
  [c0000000fffc3c50] c0000000000dd930 process_one_work+0x1e0/0x5a0
  [c0000000fffc3ce0] c0000000000dde84 worker_thread+0x194/0x680
  [c0000000fffc3d80] c0000000000e6980 kthread+0x110/0x130
  [c0000000fffc3e30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4
  3:mon> f
  3:mon> th
  [c0000000fffc3900] c00000000032831c writeback_sb_inodes+0x30c/0x590
  [c0000000fffc3a10] c000000000328684 __writeback_inodes_wb+0xe4/0x150
  [c0000000fffc3a70] c000000000328aec wb_writeback+0x30c/0x450
  [c0000000fffc3b40] c0000000003296b4 wb_workfn+0x264/0x570
  [c0000000fffc3c50] c0000000000dd930 process_one_work+0x1e0/0x5a0
  [c0000000fffc3ce0] c0000000000dde84 worker_thread+0x194/0x680
  [c0000000fffc3d80] c0000000000e6980 kthread+0x110/0x130
  [c0000000fffc3e30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4         
  3:mon> sh
  [27384.651055] INFO: rcu_sched detected stalls on CPUs/tasks:
  [27384.651220]  (detected by 4, t=40598 jiffies, g=2849830, c=2849829, q=992)
  [27384.651286] All QSes seen, last rcu_sched kthread activity 40596 
(4301188714-4301148118), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [27384.651501] rcu_sched kthread starved for 40596 jiffies! g2849830 c2849829 
f0x2 s3 ->state=0x0
  [27384.651747] INFO: rcu_sched detected stalls on CPUs/tasks:
  [27384.651905]  (detected by 4, t=590354 jiffies, g=2849830, c=2849829, 
q=1285)
  [27384.652012] All QSes seen, last rcu_sched kthread activity 590352 
(4301738470-4301148118), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [27384.652191] rcu_sched kthread starved for 590352 jiffies! g2849830 
c2849829 f0x2 s3 ->state=0x0
  [27384.730645] Unable to handle kernel paging request for data at address 
0xffffffffffffffd8
  [27384.730781] Faulting instruction address: 0xc0000000000e7258
  cpu 0x3: Vector: 300 (Data Access) at [c0000000fffc3000]
      pc: c0000000000e7258: kthread_data+0x28/0x40
      lr: c0000000000de940: wq_worker_sleeping+0x30/0x110
      sp: c0000000fffc3280
     msr: 8000000100009033
     dar: ffffffffffffffd8
   dsisr: 40000000
    current = 0xc0000000ff99e470
    paca    = 0xc00000000fb41c80   softe: 0        irq_happened: 0x01
      pid   = 14736, comm = kworker/u16:8
  enter ? for help                

  == Comment: #1 - LEKSHMI C. PILLAI - 2016-11-22 04:05:41 ==
  3:mon> th
  [c0000000fffc32b0] c0000000000de940 wq_worker_sleeping+0x30/0x110
  [c0000000fffc32f0] c000000000af31bc __schedule+0x6ec/0x990
  [c0000000fffc33c0] c000000000af34a8 schedule+0x48/0xc0
  [c0000000fffc33f0] c0000000000bd3d0 do_exit+0x760/0xc30
  [c0000000fffc34b0] c000000000020bf4 die+0x314/0x470
  [c0000000fffc3540] c000000000050d98 bad_page_fault+0xd8/0x150
  [c0000000fffc35b0] c000000000008680 handle_page_fault+0x2c/0x30
  --- Exception: 300 (Data Access) at c000000000324c60 
locked_inode_to_wb_and_lock_list+0x50/0x290
  [c0000000fffc3900] c00000000032831c writeback_sb_inodes+0x30c/0x590
  [c0000000fffc3a10] c000000000328684 __writeback_inodes_wb+0xe4/0x150
  [c0000000fffc3a70] c000000000328aec wb_writeback+0x30c/0x450
  [c0000000fffc3b40] c0000000003296b4 wb_workfn+0x264/0x570
  [c0000000fffc3c50] c0000000000dd930 process_one_work+0x1e0/0x5a0
  [c0000000fffc3ce0] c0000000000dde84 worker_thread+0x194/0x680
  [c0000000fffc3d80] c0000000000e6980 kthread+0x110/0x130
  [c0000000fffc3e30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4
  3:mon>

  == Comment: #6 - Laurent Dufour - 2016-11-23 03:00:16 ==
  Logged in luckyv1, found a lot of ipr issue on this node:
  [525973.896624] qla2xxx 0005:09:00.0: vpd r/w failed.  This is likely a 
firmware bug on this device.  Contact the card vendor for a firmware update
  [525973.956619] qla2xxx 0005:09:00.1: vpd r/w failed.  This is likely a 
firmware bug on this device.  Contact the card vendor for a firmware update
  [529433.834853] ipr 0001:04:00.0: FFFE: Soft device bus error recovered by 
the IOA
  [529433.834867] ipr: -----Failing Device Information-----
  [529433.834870] ipr: World Wide Unique ID: 500507605EC10C000000000000000000
  [529433.834873] ipr: Device Resource Path: FF
  [529433.834875] ipr: Primary Problem Description: Command Timeout             
   
  [529433.834878] ipr: Secondary Problem Description:  Command timeout expired  
      
  [529433.834880] ipr: SCSI Sense Data:
  [529433.834882] ipr: 00000000: 00000000 00000000 00000000 00000000
  [529433.834884] ipr: 00000010: 00000000 00000000 00000000 00000000
  [529433.834886] ipr: SCSI Command Descriptor Block: 
  [529433.834889] ipr: 00000000: 9E120004 0F000000 00000000 0020AD00
  [529433.834891] ipr: Additional IOA Data:
  [529433.834893] ipr: 00000000: 4646001C 44010007 00050000 04700002
  [529433.834895] ipr: 00000010: 3B894A49 1EE620CC 04700002 49574631
  [529433.834897] ipr: 00000020: 455300CC 06B00027 00000020 84000000
  [529433.834899] ipr: 00000030: 00000000 05801000 0B29A7C0 00000000
  [529433.834901] ipr: 00000040: 00000000 00000000 00000000 00000000
  [529433.834904] ipr: 00000050: 00000000 00000000 00000000 00000000
  [529433.834906] ipr: 00000060: 00000000 00000000 00000000 00000000
  [529433.834908] ipr: 00000070: 00000000 00000000 00000000 00000000
  [529433.834910] ipr: 00000080: 00000000 00000000 00000000 00000000
  [529433.834912] ipr: 00000090: 00000000 00000000 00000000 00000000
  [529433.834914] ipr: 000000A0: 00000000 D4000018 80000000 FFFFFFFF
  [529433.834917] ipr: 000000B0: FFFFFFFF 00000000 0980EC21 00000000
  [529433.834919] ipr: 000000C0: 00000000 00000000 01769A24 00000000
  [529433.834921] ipr: 000000D0: 01D3C300 E0050000 FFFFFFFE 0B5A0000
  [529433.834923] ipr: 000000E0: 00000000 9E120004 0F000000 00000000
  [529433.834926] ipr: 000000F0: 43440010 9E120004 0F000000 00000000
  [529433.834928] ipr: 00000100: 0020AD00 45480010 0100E038 9E12FFFF
  [529433.834930] ipr: 00000110: 01080002 00000000 45540004 00001463

  In addition there are some NFS issue reported:
  [563034.817901] nfs: server 10.33.11.31 not responding, timed out
  [563405.504308] nfs: server 10.33.11.31 not responding, timed out

  This said, chig5 enter xmon due to a bad pointer in the kernel:
  3:mon> e
  cpu 0x3: Vector: 300 (Data Access) at [c0000000fffc3000]
      pc: c0000000000e7258: kthread_data+0x28/0x40
      lr: c0000000000de940: wq_worker_sleeping+0x30/0x110
      sp: c0000000fffc3280
     msr: 8000000100009033
     dar: ffffffffffffffd8
   dsisr: 40000000
    current = 0xc0000000ff99e470
    paca    = 0xc00000000fb41c80         softe: 0        irq_happened: 0x01
      pid   = 14736, comm = kworker/u16:8
  3:mon> th
  [c0000000fffc32b0] c0000000000de940 wq_worker_sleeping+0x30/0x110
  [c0000000fffc32f0] c000000000af31bc __schedule+0x6ec/0x990
  [c0000000fffc33c0] c000000000af34a8 schedule+0x48/0xc0
  [c0000000fffc33f0] c0000000000bd3d0 do_exit+0x760/0xc30
  [c0000000fffc34b0] c000000000020bf4 die+0x314/0x470
  [c0000000fffc3540] c000000000050d98 bad_page_fault+0xd8/0x150
  [c0000000fffc35b0] c000000000008680 handle_page_fault+0x2c/0x30
  --- Exception: 300 (Data Access) at c000000000324c60 
locked_inode_to_wb_and_lock_list+0x50/0x290
  [c0000000fffc3900] c00000000032831c writeback_sb_inodes+0x30c/0x590
  [c0000000fffc3a10] c000000000328684 __writeback_inodes_wb+0xe4/0x150
  [c0000000fffc3a70] c000000000328aec wb_writeback+0x30c/0x450
  [c0000000fffc3b40] c0000000003296b4 wb_workfn+0x264/0x570
  [c0000000fffc3c50] c0000000000dd930 process_one_work+0x1e0/0x5a0
  [c0000000fffc3ce0] c0000000000dde84 worker_thread+0x194/0x680
  [c0000000fffc3d80] c0000000000e6980 kthread+0x110/0x130
  [c0000000fffc3e30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4

  Looking at the other guest as Lekshmi mentioned that all the guests
  are crashing.

  == Comment: #7 - Laurent Dufour - 2016-11-23 03:24:34 ==
  The guest lucky01 (4.4.0-47-generic) is fine :
  root@lucky01:/Blast# date
  Wed Nov 23 03:04:23 CST 2016

  The guest lucky02 (4.4.0-47generic) has entered xmon due to the same issue as 
lukcy05:
  7:mon> e
  cpu 0x7: Vector: 300 (Data Access) at [c0000001f265b620]
      pc: c000000000324c60: locked_inode_to_wb_and_lock_list+0x50/0x290
      lr: c00000000032831c: writeback_sb_inodes+0x30c/0x590
      sp: c0000001f265b8a0
     msr: 8000000100009033
     dar: 0
   dsisr: 40000000
    current = 0xc0000001f222fcc0
    paca    = 0xc00000000fb44280         softe: 0        irq_happened: 0x01
      pid   = 12062, comm = kworker/u16:3
  7:mon> t
  [c0000001f265b900] c00000000032831c writeback_sb_inodes+0x30c/0x590
  [c0000001f265ba10] c000000000328684 __writeback_inodes_wb+0xe4/0x150
  [c0000001f265ba70] c000000000328aec wb_writeback+0x30c/0x450
  [c0000001f265bb40] c0000000003296b4 wb_workfn+0x264/0x570
  [c0000001f265bc50] c0000000000dd930 process_one_work+0x1e0/0x5a0
  [c0000001f265bce0] c0000000000dde84 worker_thread+0x194/0x680
  [c0000001f265bd80] c0000000000e6980 kthread+0x110/0x130
  [c0000001f265be30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4
  --- Exception: 0  at 0000000000000000

  The guest lucky03 didn't enter xmon but is not responding any more. 
Unfornately sysrq is not enabled on this guest. There are still some activity 
on this guest.
  root@luckyv1:~# virsh qemu-monitor-command --hmp lucky03 'info cpus'
  * CPU #0: nip=0xc0000000001035e0 thread_id=76434
    CPU #1: nip=0xc0000000000863dc thread_id=76435
    CPU #2: nip=0xc0000000000863dc thread_id=76436
    CPU #3: nip=0xc0000000000863dc thread_id=76437
    CPU #4: nip=0xc0000000000863dc thread_id=76439
    CPU #5: nip=0xc0000000000863dc thread_id=76440
    CPU #6: nip=0x0000000010072f68 thread_id=76441
    CPU #7: nip=0xc0000000000863dc thread_id=76442

  
  The guest lucky04 is not responding but neither enter xmon, but sysrq are not 
enabled on this node.
  But the node seems to be still active:
  root@luckyv1:~# virsh qemu-monitor-command --hmp lucky04 'info cpus'
  * CPU #0: nip=0xc000000000af8834 thread_id=68201
    CPU #1: nip=0xc0000000000863dc thread_id=68202
    CPU #2: nip=0xc0000000000645ac thread_id=68203
    CPU #3: nip=0xc0000000000863dc thread_id=68204
    CPU #4: nip=0xc0000000000863dc thread_id=68205
    CPU #5: nip=0xc0000000000863dc thread_id=68206
    CPU #6: nip=0xc000000000064590 thread_id=68207
    CPU #7: nip=0xc000000000af8904 thread_id=68208

  The guest lucky06 is alive:
  root@lucky06:/# cat /proc/version; date
  Linux version 4.4.0-47-generic (buildd@bos01-ppc64el-008) (gcc version 5.4.0 
20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2) ) #68-Ubuntu SMP Wed Oct 26 
19:38:24 UTC 2016
  Wed Nov 23 03:20:19 CST 2016

  To summarize:
  lucky01  good
  lucky02  panic in locked_inode_to_wb_and_lock_list()
  lucky03  not responding but still active
  lucky04  not responding but still active
  lucky05  panic in locked_inode_to_wb_and_lock_list()
  lucky06  good

  == Comment: #10 - Laurent Dufour - 2016-11-24 10:27:52 ==
  Here the data I captured on lucky02 which did panic the way lucky05 did.

  CPU 7 panic due to a data access error:
   7:mon> e
  cpu 0x7: Vector: 300 (Data Access) at [c0000001f265b620]
      pc: c000000000324c60: locked_inode_to_wb_and_lock_list+0x50/0x290
      lr: c00000000032831c: writeback_sb_inodes+0x30c/0x590
      sp: c0000001f265b8a0
     msr: 8000000100009033
     dar: 0
   dsisr: 40000000
    current = 0xc0000001f222fcc0
    paca    = 0xc00000000fb44280         softe: 0        irq_happened: 0x01
      pid   = 12062, comm = kworker/u16:3
  7:mon> r
  R00 = c00000000032831c   R16 = c0000001fc972ef8
  R01 = c0000001f265b8a0   R17 = c0000001fc972e70
  R02 = c0000000015c6a00   R18 = c0000001fc972f60
  R03 = c0000001fc972e70   R19 = 0000000000000000
  R04 = c0000001f2230700   R20 = 0000000000000000
  R05 = 0000000000000000   R21 = c0000001f2658000
  R06 = 00000001fef30000   R22 = c0000001f35d5c88
  R07 = 000108f684c40713   R23 = c0000001f35d5c68
  R08 = 0000000000000000   R24 = 0000000000000000
  R09 = 0000000000000000   R25 = c0000001fc972ef8
  R10 = 0000000080000007   R26 = 0000000000000000
  R11 = 00000000030883ec   R27 = 0000000000000000
  R12 = 0000000000000000   R28 = 0000000000000001
  R13 = c00000000fb44280   R29 = c0000001fc972e70
  R14 = c0000000000e6878   R30 = c0000001f265bba0
  R15 = 0000000000000000   R31 = 0000000000000000 
  pc  = c000000000324c60 locked_inode_to_wb_and_lock_list+0x50/0x290
  cfar= 00003fff9647a5a8
  lr  = c00000000032831c writeback_sb_inodes+0x30c/0x590
  msr = 8000000100009033   cr  = 24652882
  ctr = c000000000110b50   xer = 0000000020000000   trap =  300
  dar = 0000000000000000   dsisr = 40000000
  7:mon> t 
  [c0000001f265b900] c00000000032831c writeback_sb_inodes+0x30c/0x590
  [c0000001f265ba10] c000000000328684 __writeback_inodes_wb+0xe4/0x150
  [c0000001f265ba70] c000000000328aec wb_writeback+0x30c/0x450
  [c0000001f265bb40] c0000000003296b4 wb_workfn+0x264/0x570
  [c0000001f265bc50] c0000000000dd930 process_one_work+0x1e0/0x5a0
  [c0000001f265bce0] c0000000000dde84 worker_thread+0x194/0x680
  [c0000001f265bd80] c0000000000e6980 kthread+0x110/0x130
  [c0000001f265be30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4

  The system tried to access data pointed by r31 which contains data retrieved 
from the inode address stored in r29.
  The panic happened during the inline call to wb_get when inode->i_wb is used.
  So here inode->i_wb is null which is not expeted to happen.

  At this time, CPU 6 is waiting for the same inode's spinlock inode->i_lock to 
be released here:
  6:mon> t
  [link register   ] c000000000064624 __spin_yield+0xb4/0xc0
  [c0000000fdb93900] c0000000fdb93940 (unreliable)
  [c0000000fdb93970] c000000000af8968 _raw_spin_lock+0xd8/0xe0
  [c0000000fdb939a0] c000000000327330 __mark_inode_dirty+0xd0/0x4a0
  [c0000000fdb93a20] c0000000003326f0 mark_buffer_dirty+0x1f0/0x210
  [c0000000fdb93a60] c000000000334ff0 __block_commit_write.isra.7+0xf0/0x170
  [c0000000fdb93ad0] c00000000033513c block_write_end+0x7c/0x100
  [c0000000fdb93b20] c00000000033a340 blkdev_write_end+0x60/0xa0
  [c0000000fdb93b80] c00000000022d340 generic_perform_write+0x180/0x280
  [c0000000fdb93c20] c00000000022f568 __generic_file_write_iter+0x208/0x250
  [c0000000fdb93c80] c00000000033b498 blkdev_write_iter+0x98/0x160
  [c0000000fdb93cf0] c0000000002e24a4 new_sync_write+0xc4/0x120
  [c0000000fdb93d90] c0000000002e32a0 vfs_write+0xc0/0x230
  [c0000000fdb93de0] c0000000002e42dc SyS_write+0x6c/0x110
  [c0000000fdb93e30] c000000000009204 system_call+0x38/0xb4
  --- Exception: c01 (System Call) at 00003fff944c6728
  SP (3ffef9ffe0c0) is in userspace

  The CPU 6 hold the inode->i_lock in the call to  inode_to_wb_and_lock_list().
  Why inode->i_wb is null ?

  == Comment: #11 - Laurent Dufour - 2016-11-25 11:57:50 ==
  I found that lucky03 hit the panic also.
  I took a closer look and it seems that there is a lock / memory barrier issue 
around between the code run in locked_inode_to_wb_and_lock_list() and another 
CPU. I found that the CPU 5 was running 'latest_blast' at the time the CPU 0 
hit the panic. The same applied on lucky02.

  == Comment: #13 - Laurent Dufour - 2016-12-05 07:32:30 ==
  I did some test on luckyv05 and I was able to recreate it on 4.8 vanilla 
kernel:
  [113031.075540] Unable to handle kernel paging request for data at address 
0x00000000
  [113031.075614] Faulting instruction address: 0xc0000000003692e0
  0:mon> t
  [c0000000fb65f900] c00000000036cb6c writeback_sb_inodes+0x30c/0x590
  [c0000000fb65fa10] c00000000036ced4 __writeback_inodes_wb+0xe4/0x150
  [c0000000fb65fa70] c00000000036d33c wb_writeback+0x30c/0x450
  [c0000000fb65fb40] c00000000036e198 wb_workfn+0x268/0x580
  [c0000000fb65fc50] c0000000000f3470 process_one_work+0x1e0/0x590
  [c0000000fb65fce0] c0000000000f38c8 worker_thread+0xa8/0x660
  [c0000000fb65fd80] c0000000000fc4b0 kthread+0x110/0x130
  [c0000000fb65fe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c
  --- Exception: 0  at 0000000000000000
  0:mon> e
  cpu 0x0: Vector: 300 (Data Access) at [c0000000fb65f620]
      pc: c0000000003692e0: locked_inode_to_wb_and_lock_list+0x50/0x290
      lr: c00000000036cb6c: writeback_sb_inodes+0x30c/0x590
      sp: c0000000fb65f8a0
     msr: 800000010280b033
     dar: 0
   dsisr: 40000000
    current = 0xc0000001d69be400
    paca    = 0xc000000003480000         softe: 0        irq_happened: 0x01
      pid   = 18689, comm = kworker/u16:10
  Linux version 4.8.0 (laurent@lucky05) (gcc version 5.4.0 20160609 (Ubuntu/IBM 
5.4.0-6ubuntu1~16.04.4) ) #1 SMP Thu Dec 1 09:25:13 CST 2016

  So this is not a Ubuntu's issue but a more global one which is not fixed by 
the patch 
  https://patchwork.kernel.org/patch/9247955/ 
  as expected while investigating the bug 142781.

  == Comment: #17 - Laurent Dufour - 2016-12-07 03:22:05 ==
  For the record, I also hit the bug with 4.9-rc8:
  4:mon> t
  [c000000012a7f900] c0000000003787cc writeback_sb_inodes+0x30c/0x590
  [c000000012a7fa10] c000000000378b34 __writeback_inodes_wb+0xe4/0x150
  [c000000012a7fa70] c000000000378f9c wb_writeback+0x30c/0x450
  [c000000012a7fb40] c000000000379df8 wb_workfn+0x268/0x580
  [c000000012a7fc50] c0000000000f8c20 process_one_work+0x1e0/0x590
  [c000000012a7fce0] c0000000000f9078 worker_thread+0xa8/0x650
  [c000000012a7fd80] c000000000101a30 kthread+0x110/0x130
  [c000000012a7fe30] c00000000000c0e8 ret_from_kernel_thread+0x5c/0x74
  4:mon> e
  cpu 0x4: Vector: 300 (Data Access) at [c000000012a7f620]
      pc: c000000000374f40: locked_inode_to_wb_and_lock_list+0x50/0x290
      lr: c0000000003787cc: writeback_sb_inodes+0x30c/0x590
      sp: c000000012a7f8a0
     msr: 800000010280b033
     dar: 0
   dsisr: 40000000
    current = 0xc000000011540000
    paca    = 0xc000000003482400         softe: 0        irq_happened: 0x01
      pid   = 8357, comm = kworker/u16:3
  Linux version 4.9.0-rc8 (root@lucky05) (gcc version 5.4.0 20160609 
(Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #2 SMP Tue Dec 6 05:17:47 CST 2016

  == Comment: #24 - Thiago Jung Bauermann - 2017-01-11 16:09:45 ==
  Dan Willians posted on 01/06 a patch series which aims to solve this bug:

  https://www.spinics.net/lists/linux-fsdevel/msg106092.html

  Unfortunately, the kernel test robot found problems with it:

  http://lkml.iu.edu/hypermail/linux/kernel/1701.1/00239.html

  Still, I think it's useful to perform tests to confirm that:

  1. v4.10 is still affected by the problem and
  2. Dan's patches fix this bug.

  Therefore, could you please reproduce this bug on the unmodified
  v4.10-rc3 build below?

  http://kernel.stglabs.ibm.com/~bauermann/bug149014/v4.10-rc3/

  This will allow us to confirm point 1.

  Then, can you please try to reproduce it with the build below?

  http://kernel.stglabs.ibm.com/~bauermann/bug149014/fix-
  backing_dev_info-lifetime-v2/

  This one is v4.10-rc3 plus Dan Willian's two patches from my link
  above applied to it.

  == Comment: #28 - Lata Kuntal - 2017-01-16 01:34:05 ==
  I am seeing the same crash issue on one of UbuntuKVM 16.04.02 guest gusg8.
  Pasting the console logs below :

  root@guskvm:~# virsh console gusg8 --force
  Connected to domain gusg8
  Escape character is ^]

  0:mon>
  0:mon>
  0:mon> t
  [c00000023d1ab900] c00000000036a41c writeback_sb_inodes+0x30c/0x590
  [c00000023d1aba10] c00000000036a784 __writeback_inodes_wb+0xe4/0x150
  [c00000023d1aba70] c00000000036abfc wb_writeback+0x30c/0x450
  [c00000023d1abb40] c00000000036ba38 wb_workfn+0x268/0x580
  [c00000023d1abc50] c0000000000ef5e8 process_one_work+0x1e8/0x5b0
  [c00000023d1abce0] c0000000000efa58 worker_thread+0xa8/0x650
  [c00000023d1abd80] c0000000000f8224 kthread+0x114/0x140
  [c00000023d1abe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c
  --- Exception: 0  at 0000000000000000
  0:mon>
  0:mon>
  0:mon> d
  0000000000000000 **************** ****************  |                |
  0:mon> r
  R00 = c00000000036a41c   R16 = c00000027ca0e868
  R01 = c00000023d1ab8a0   R17 = c00000027ca0e7e0
  R02 = c0000000014a6600   R18 = c00000027ca0e8d0
  R03 = c00000027ca0e7e0   R19 = 0000000000000000
  R04 = c0000001b092e710   R20 = 0000000000000000
  R05 = 0000000000000000   R21 = c00000023d1a8000
  R06 = 000000027ee30000   R22 = c000000273aace50
  R07 = 00001d0c11165f1a   R23 = c000000273aace30
  R08 = 0000000000000000   R24 = 0000000000000000
  R09 = 0000000000000000   R25 = 0000000000000000
  R10 = 0000000080000000   R26 = c00000027ca0e868
  R11 = c0000000014daae0   R27 = 0000000000000000
  R12 = 0000000000005500   R28 = 0000000000000001
  R13 = c00000000fb80000   R29 = c00000027ca0e7e0
  R14 = c0000000000f8118   R30 = c00000023d1abba0
  R15 = 0000000000000000   R31 = 0000000000000000
  pc  = c000000000366be4 locked_inode_to_wb_and_lock_list+0x54/0x290
  cfar= d000000004bbf2e4 xfs_buf_delwri_submit_buffers+0x1e4/0x2b0 [xfs]
  lr  = c00000000036a41c writeback_sb_inodes+0x30c/0x590
  msr = 800000010280b033   cr  = 24aa2882
  ctr = c000000000122210   xer = 0000000020000000   trap =  300
  dar = 0000000000000000   dsisr = 40000000
  0:mon> c
  cpus stopped: 0x0-0x3
  0:mon> e
  cpu 0x0: Vector: 300 (Data Access) at [c00000023d1ab620]
      pc: c000000000366be4: locked_inode_to_wb_and_lock_list+0x54/0x290
      lr: c00000000036a41c: writeback_sb_inodes+0x30c/0x590
      sp: c00000023d1ab8a0
     msr: 800000010280b033
     dar: 0
   dsisr: 40000000
    current = 0xc0000001b092dc00
    paca    = 0xc00000000fb80000   softe: 0        irq_happened: 0x01
      pid   = 774, comm = kworker/u8:3
  Linux version 4.8.0-34-generic (buildd@bos01-ppc64el-026) (gcc version 5.4.0 
20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #36~16.04.1-Ubuntu SMP Wed Dec 
21 18:53:20 UTC 2016 (Ubuntu 4.8.0-34.36~16.04.1-generic 4.8.11)
  0:mon>

  
  == Comment: #33 - Thiago Jung Bauermann - 2017-01-23 15:31:24 ==
  Lekshmi mentioned that she wasn't able to reproduce this bug with kernel 
4.10.0-rc3fixlifetime+, so I replied to Dan's patch series mentioning that it 
fixes this bug:

  https://www.spinics.net/lists/linux-fsdevel/msg106830.html

  Let's see if he answers back with a status or thoughts regarding the
  patch series.

  == Comment: #34 - LEKSHMI C. PILLAI  - 2017-01-24 00:26:22 ==
  Hi

  The fix worked with 4.10.0-rc3fixlifetime+   kernel.Need to know which
  kernel the fix is going to be.and whether able to get the workaround
  for 16.04.02 ie; kernel 4.8

  
  Thanks
  Lekshmi

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1659111/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1659111] Re: UbuntuKVM guest crashed while running I/O stress test with Ubuntu kernel 4.4.0-47-generic

Reply via email to