IBM needs to identify a patch that fixes this issue. We do not have a
good mechanism to reproduce the bug.

** Changed in: linux (Ubuntu)
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1649513

Title:
  [Ubuntu 16.10] NMI watchdog and soft lockup while running htx memory
  tests in kernel 4.8.0-17-generic

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Issue:
  --------------
  NMI Watchdog Bug and soft lockup occurs when htx memory test is run in ubuntu 
16.10.

  Environment:
  --------------------------
  Arch : ppc64le
  Platform : Ubuntu KVM Guest
  Host : ubuntu 16.10 [4.8.0-17 -kernel ]
  Guest : ubuntu 16.10 [4.8.0-17 - Kernel]

  Steps To Reproduce:
  -----------------------------------

  1 - Install a Ubuntu KVM Guest and install htx package in the guest got from 
the link,
  http://ausgsa.ibm.com/projects/h/htx/public_html/htxonly/htxubuntu-413.deb 

  2 - Run the Htx mdt.mem

  3 - The system Hits soft lockup Issue as below:

  dmesg o/p:
  [60287.590335] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 1141s! 
[hxemem64:23468]
  [60287.590572] Modules linked in: vmx_crypto ip_tables x_tables autofs4 
ibmvscsi crc32c_vpmsum
  [60287.590585] CPU: 3 PID: 23468 Comm: hxemem64 Tainted: G             L  
4.8.0-17-generic #19-Ubuntu
  [60287.590587] task: c0000012a0971e00 task.stack: c0000012a2d40000
  [60287.590589] NIP: c000000000015004 LR: c000000000015004 CTR: 
c000000000165e90
  [60287.590591] REGS: c0000012a2d439a0 TRAP: 0901   Tainted: G             L   
(4.8.0-17-generic)
  [60287.590592] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 48004244  
XER: 00000000
  [60287.590603] CFAR: c000000000165890 SOFTE: 1 
                 GPR00: c000000000165f9c c0000012a2d43c20 c0000000014e5e00 
0000000000000900 
                 GPR04: 0000000000000000 0000000000000008 0000000100e4d61a 
0000000000000000 
                 GPR08: 0000000000000000 0000000000000006 0000000100e4d619 
c0000012bfee3130 
                 GPR12: 00003fffae6cdc70 00003fffae436900 
  [60287.590627] NIP [c000000000015004] arch_local_irq_restore+0x74/0x90
  [60287.590630] LR [c000000000015004] arch_local_irq_restore+0x74/0x90
  [60287.590631] Call Trace:
  [60287.590634] [c0000012a2d43c20] [c0000012bfeccd80] 0xc0000012bfeccd80 
(unreliable)
  [60287.590639] [c0000012a2d43c40] [c000000000165f9c] 
run_timer_softirq+0x10c/0x230
  [60287.590644] [c0000012a2d43ce0] [c000000000b94adc] __do_softirq+0x18c/0x3fc
  [60287.590648] [c0000012a2d43de0] [c0000000000d5828] irq_exit+0xc8/0x100
  [60287.590653] [c0000012a2d43e00] [c000000000024810] timer_interrupt+0xa0/0xe0
  [60287.590657] [c0000012a2d43e30] [c000000000002814] 
decrementer_common+0x114/0x180
  [60287.590659] Instruction dump:
  [60287.590662] 994d023a 2fa30000 409e0024 e92d0020 61298000 7d210164 38210020 
e8010010 
  [60287.590670] 7c0803a6 4e800020 60420000 4bfed259 <60000000> 4bffffe4 
60420000 e92d0020 
  [63127.581494] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 339s! 
[hxemem64:23467]
  [63127.629682] Modules linked in: vmx_crypto ip_tables x_tables autofs4 
ibmvscsi crc32c_vpmsum
  [63127.629699] CPU: 2 PID: 23467 Comm: hxemem64 Tainted: G             L  
4.8.0-17-generic #19-Ubuntu
  [63127.629701] task: c0000012a0965800 task.stack: c0000012a2d58000
  [63127.629703] NIP: 0000000010011e60 LR: 000000001000ec6c CTR: 
0000000000f33196
  [63127.629706] REGS: c0000012a2d5bea0 TRAP: 0901   Tainted: G             L   
(4.8.0-17-generic)
  [63127.629707] MSR: 800000010000d033 <SF,EE,PR,ME,IR,DR,RI,LE,TM[E]>  CR: 
42004482  XER: 00000000
  [63127.629719] CFAR: 0000000010011e68 SOFTE: 1 
                 GPR00: 000000001000e854 00003fffadc2e540 0000000010047f00 
000000000000000d 
                 GPR04: 0000000002000000 00003ff5a8000000 5a5a5a5a5a5a5a5a 
00003ff5b0667348 
                 GPR08: 0000000000000000 000000001006c8e0 000000001006ca04 
fffffffffffff001 
                 GPR12: 00003fffae6cdc70 00003fffadc36900 
  [63127.629740] NIP [0000000010011e60] 0x10011e60
  [63127.629742] LR [000000001000ec6c] 0x1000ec6c
  [63127.629743] Call Trace:

  == Comment: #3 - Santhosh G <santh...@in.ibm.com> - 2016-09-28 02:17:29 ==
  Memory Info :

  root@ubuntu:~# cat /proc/meminfo 
  MemTotal:       78539776 kB
  MemFree:        72219392 kB
  MemAvailable:   77217088 kB
  Buffers:          212544 kB
  Cached:          5249088 kB
  SwapCached:            0 kB
  Active:          1440832 kB
  Inactive:        4107264 kB
  Active(anon):      93888 kB
  Inactive(anon):     8640 kB
  Active(file):    1346944 kB
  Inactive(file):  4098624 kB
  Unevictable:           0 kB
  Mlocked:               0 kB
  SwapTotal:       3443648 kB
  SwapFree:        3443648 kB
  Dirty:                 0 kB
  Writeback:             0 kB
  AnonPages:         87296 kB
  Mapped:            30400 kB
  Shmem:             16128 kB
  Slab:             381440 kB
  SReclaimable:     295872 kB
  SUnreclaim:        85568 kB
  KernelStack:        2176 kB
  PageTables:         2048 kB
  NFS_Unstable:          0 kB
  Bounce:                0 kB
  WritebackTmp:          0 kB
  CommitLimit:    42639808 kB
  Committed_AS:     224768 kB
  VmallocTotal:   8589934592 kB
  VmallocUsed:           0 kB
  VmallocChunk:          0 kB
  HardwareCorrupted:     0 kB
  AnonHugePages:         0 kB
  ShmemHugePages:        0 kB
  ShmemPmdMapped:        0 kB
  CmaTotal:              0 kB
  CmaFree:               0 kB
  HugePages_Total:       9
  HugePages_Free:        9
  HugePages_Rsvd:        0
  HugePages_Surp:        0
  Hugepagesize:      16384 kB

  free -h :
                total        used        free      shared  buff/cache   
available
  Mem:            74G        545M         68G         15M        5.5G         
73G
  Swap:          3.3G          0B        3.3G

  == Comment: #5 - Santhosh G <santh...@in.ibm.com> - 2016-09-29 02:49:49 ==
  (In reply to comment #4)
  > Hi Santhosh, 
  > After how long are you seeing this error ?
  > Can you share the output by:
  > 1) start the mdt.mem tests.
  > 2) While the tests are running what is the output of 'free -h' ?
  > 3) Attach /tmp/htxerr 
  > 
  > Thank you.

  Hi Vaishnavi,

  I have run the test for more than 12 hours and not sure exactly when
  the lockup occurs.

  Before starting the tests,

  free -h :
                total        used        free      shared  buff/cache   
available
  Mem:            74G        528M         68G         15M        5.5G         
73G
  Swap:          3.3G          0B        3.3G

  After running the tests for more than 10 min :

  total        used        free      shared  buff/cache   available
  Mem:            74G        570M         20G         48G         53G         
25G
  Swap:          3.3G          0B        3.3G

  The memory usage gradually Increases.

  Not sure exactly at which point the lockup occurs.

  And /tmp/htxerror is empty.

  == Comment: #7 - Vaishnavi Bhat <vaish...@in.ibm.com> - 2016-09-30 04:03:23 ==
  Hi Santhosh ,

  While running the mdt.mem, we see that the about 60% of memory is used and 
free swap is reduced to 0B. 
  total        used        free      shared  buff/cache   available
  Mem:            74G        570M         20G         48G         53G         
25G
  Swap:          3.3G          0B        3.3G

  Top output
   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND    
                                                                                
        
   1860 root      38  18 48.484g 0.046t 0.046t S 318.1 63.5   4865:53 hxemem64 

  Also the dmesg shows traces of OOM and softlock up with hxemem.

  Can you please try increasing vm.min_free_kbytes value and see if it shows 
any improvement? I would suggest starting with the double of the current value.
  Current value :
  $ sysctl -n vm.min_free_kbytes
  180224
  New value:
  $sysctl -w vm.min_free_kbytes=<new value>

  Thank you.

  == Comment: #10 - Vaishnavi Bhat <vaish...@in.ibm.com> - 2016-10-20 04:06:20 
==
  (In reply to comment #9)
  > Hi Vaishnavi,
  > 
  > I am able to reproduce this issue even in 4.8.0-22-generic
  > 
  > o/p:
  > sysctl -n vm.min_free_kbytes
  > 360448
  > 
  > Please, take a look in to the issue.
  > 
  > Thanks.

  Thanks for the confirmation, the issue is being reproduced with 
  sysctl -n vm.min_free_kbytes
  360448

  Thank you.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1649513/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to