Question to IBM: have you made any progress towards identifying a patch
to address this issue?

** Changed in: linux (Ubuntu)
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1632458

Title:
  [Ubuntu 16.10] - System crashes and gives out call traces when
  libhugetlbfs test suite is run.

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  == Comment: #0 - Santhosh G <santh...@in.ibm.com> - 2016-09-27 01:55:00 ==
  Issue:
  Kernel unable to handle page request when heapshrink test case is run from 
libhugetlbfs suite.

  Environment:
  arch - ppc64le
  ubuntu kvm guest

  Host related Info:
  Kernel:
  -----------------
  uname -a
  Linux ltc-haba1 4.8.0-17-generic #19-Ubuntu SMP Sun Sep 25 06:35:40 UTC 2016 
ppc64le ppc64le ppc64le GNU/Linux

  Memory:
  --------------------
  oot@ltc-haba1:~# free -h
                total        used        free      shared  buff/cache   
available
  Mem:           255G         65G        187G         22M        1.9G        
188G
  Swap:          225G          0B        225G

  Hugepages configured:
  ----------------------------------------
  root@ltc-haba1:~# cat /proc/meminfo | grep -i Huge
  AnonHugePages:     81920 kB
  ShmemHugePages:        0 kB
  HugePages_Total:    4096
  HugePages_Free:     3584
  HugePages_Rsvd:        0
  HugePages_Surp:        0
  Hugepagesize:      16384 kB

  
  Guest Related Info:
  --------------------------------------
  -------------------------------------
  Kernel:
  -------------------------
  root@ubuntu:~/libhugetlbfs# uname -a
  Linux ubuntu 4.8.0-17-generic #19-Ubuntu SMP Sun Sep 25 06:35:40 UTC 2016 
ppc64le ppc64le ppc64le GNU/Linux

  Memory:
  ---------------------------------
  root@ubuntu:~/libhugetlbfs# free -h
                total        used        free      shared  buff/cache   
available
  Mem:           8.0G        133M        7.7G         15M        132M        
7.5G
  Swap:          3.3G          0B        3.3G

  Hugepages configured:
  -------------------------------------------
  root@ubuntu:~/libhugetlbfs# cat /proc/meminfo | grep -i Huge
  AnonHugePages:         0 kB
  ShmemHugePages:        0 kB
  HugePages_Total:     256
  HugePages_Free:      256
  HugePages_Rsvd:        0
  HugePages_Surp:        0
  Hugepagesize:      16384 kB

  
  Steps to reproduce:
  1- Install a ubuntu kvm guest with hugepages memory Backing.
  2 - git clone the latest libhugetlbfs from 
https://github.com/libhugetlbfs/libhugetlbfs.git
  3 - configure huge[pages in guest and run make check.

  xmon is configured in the system .
  The system gets call traces and enters xmon console:

  HUGETLB_VERBOSE=1 HUGETLB_MORECORE=yes heap-overflow (16M: 64):       [  
281.735713] Unable to handle kernel paging request for data at address 
0x4200000000328e38
  [  281.735804] Faulting instruction address: 0xc00000000027b410
  cpu 0x1: Vector: 300 (Data Access) at [c0000001fa8c3730]
      pc: c00000000027b410: shrink_active_list+0x300/0x4d0
      lr: c00000000027b3f4: shrink_active_list+0x2e4/0x4d0
      sp: c0000001fa8c39b0
     msr: 800000010280b033
     dar: 4200000000328e38
   dsisr: 42000000
    current = 0xc0000001fa8adc00
    paca    = 0xc00000000fb80900         softe: 0        irq_happened: 0x01
      pid   = 50, comm = kswapd0
  Linux version 4.8.0-17-generic (buildd@bos01-ppc64el-025) (gcc version 6.2.0 
20160914 (Ubuntu 6.2.0-3ubuntu15) ) #19-Ubuntu SMP Sun Sep 25 06:35:40 UTC 2016 
(Ubuntu 4.8.0-17.19-generic 4.8.0-rc7)
  enter ? for help
  [c0000001fa8c3aa0] c00000000027bbdc shrink_node_memcg+0x5fc/0x800
  [c0000001fa8c3bc0] c00000000027bf0c shrink_node+0x12c/0x3f0
  [c0000001fa8c3c80] c00000000027d500 kswapd+0x460/0x990
  [c0000001fa8c3d80] c0000000000fd120 kthread+0x110/0x130
  [c0000001fa8c3e30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c

  xmon logs:

  1:mon> e
  cpu 0x1: Vector: 300 (Data Access) at [c0000001fa8e7730]
      pc: c00000000027b410: shrink_active_list+0x300/0x4d0
      lr: c00000000027b3f4: shrink_active_list+0x2e4/0x4d0
      sp: c0000001fa8e79b0
     msr: 800000010280b033
     dar: 42000000000c58d0
   dsisr: 42000000
    current = 0xc0000001fa8a0000
    paca    = 0xc00000000fb80900         softe: 0        irq_happened: 0x01
      pid   = 50, comm = kswapd0
  Linux version 4.8.0-17-generic (buildd@bos01-ppc64el-025) (gcc version 6.2.0 
20160914 (Ubuntu 6.2.0-3ubuntu15) ) #19-Ubuntu SMP Sun Sep 25 06:35:40 UTC 2016 
(Ubuntu 4.8.0-17.19-generic 4.8.0-rc7)

  1:mon> r
  R00 = c00000000027b3f4   R16 = c0000001fffcfe00
  R01 = c0000001fa8e79b0   R17 = 000000000000010a
  R02 = c0000000014e5e00   R18 = 42000000000cbdd0
  R03 = 0000000000000001   R19 = c0000001fffc6300
  R04 = 0000000000000005   R20 = c0000001fa8e79e0
  R05 = 0000000000000000   R21 = c0000001fe144800
  R06 = f0000000003bc9a0   R22 = 0000000000000001
  R07 = 00000001fee30000   R23 = 0000000000000005
  R08 = 000000000000002a   R24 = 000000000000207d
  R09 = 0000000000000000   R25 = 0000000000000100
  R10 = c000000001034e86   R26 = 0000000000000200
  R11 = 0000000000000000   R27 = c0000001fa8e79d0
  R12 = 0000000000002200   R28 = c0000001fa8e7ca0
  R13 = c00000000fb80900   R29 = 0000000000000040
  R14 = f000000000380000   R30 = c0000001fe144800
  R15 = f000000000380020   R31 = c0000001fa8e79f0
  pc  = c00000000027b410 shrink_active_list+0x300/0x4d0
  cfar= c0000000000b47a4 kvmppc_call_hv_entry+0x130/0x134
  lr  = c00000000027b3f4 shrink_active_list+0x2e4/0x4d0
  msr = 800000010280b033   cr  = 24022222
  ctr = c0000000002ba900   xer = 0000000020000000   trap =  300
  dar = 42000000000c58d0   dsisr = 42000000

  1:mon> t
  [c0000001fa8e7aa0] c00000000027bc70 shrink_node_memcg+0x690/0x800
  [c0000001fa8e7bc0] c00000000027bf0c shrink_node+0x12c/0x3f0
  [c0000001fa8e7c80] c00000000027d500 kswapd+0x460/0x990
  [c0000001fa8e7d80] c0000000000fd120 kthread+0x110/0x130
  [c0000001fa8e7e30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c

  == Comment: #2 - Santhosh G <santh...@in.ibm.com> - 2016-09-27 04:28:02 ==
  Something similar to this issue is observed when mm tests in ltp is run.

  Call Traces Output:
  oom01       0  TINFO [ 2577.866629] Unable to handle kernel paging request 
for data at address 0x42000000004311d0
  [ 2577.866759] Faulting instruction address: 0xc00000000027b410
  [ 2577.866846] Oops: Kernel access of bad area, sig: 11 [#1]
  [ 2577.866911] SMP NR_CPUS=2048 NUMA pSeries
  [ 2577.866980] Modules linked in: vmx_crypto ip_tables x_tables autofs4 
ibmvscsi crc32c_vpmsum
  [ 2577.867152] CPU: 119 PID: 116856 Comm: oom01 Not tainted 4.8.0-17-generic 
#19-Ubuntu
  [ 2577.867252] task: c000000db5d56000 task.stack: c00000031a898000
  [ 2577.867334] NIP: c00000000027b410 LR: c00000000027b3f4 CTR: 
0000000000000006
  [ 2577.867433] REGS: c00000031a89b3e0 TRAP: 0300   Not tainted  
(4.8.0-17-generic)
  [ 2577.867531] MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  
CR: 28422222  XER: 20000000
  [ 2577.867864] CFAR: c0000000000b477c DAR: 42000000004311d0 DSISR: 42000000 
SOFTE: 0 
  GPR00: c00000000027b3f4 c00000031a89b660 c0000000014e5e00 0000000000000001 
  GPR04: 0000000000000005 0000000000000000 f000000000252960 0000000de7db0000 
  GPR08: 000000000000007d 0000000000000000 c000000001034e86 0000000000000000 
  GPR12: 0000000000002200 c00000000fbc2f00 f000000001ec8000 f000000001ec8020 
  GPR16: c000000defb93e00 0000000000000111 42000000004376d0 c000000defb8a300 
  GPR20: c00000031a89b690 c000000dee0a4800 0000000000000001 0000000000000005 
  GPR24: 0000000000023657 0000000000000100 0000000000000200 c00000031a89b680 
  GPR28: c00000031a89ba00 0000000000000040 c000000dee0a4800 c00000031a89b6a0 
  [ 2577.869185] NIP [c00000000027b410] shrink_active_list+0x300/0x4d0
  [ 2577.869268] LR [c00000000027b3f4] shrink_active_list+0x2e4/0x4d0
  [ 2577.869349] Call Trace:
  [ 2577.869385] [c00000031a89b660] [c00000000027b3f4] 
shrink_active_list+0x2e4/0x4d0 (unreliable)
  [ 2577.869518] [c00000031a89b750] [c00000000027bc70] 
shrink_node_memcg+0x690/0x800
  [ 2577.869633] [c00000031a89b870] [c00000000027bf0c] shrink_node+0x12c/0x3f0
  [ 2577.869733] [c00000031a89b930] [c00000000027c308] 
do_try_to_free_pages+0x138/0x480
  [ 2577.869849] [c00000031a89b9e0] [c00000000027c74c] 
try_to_free_pages+0xfc/0x270
  [ 2577.869963] [c00000031a89ba70] [c000000000264afc] 
__alloc_pages_nodemask+0x72c/0xee0
  [ 2577.870081] [c00000031a89bc30] [c0000000002e1758] 
alloc_pages_vma+0x108/0x360
  [ 2577.870181] [c00000031a89bcc0] [c0000000002ac5d4] 
handle_mm_fault+0x1024/0x14e0
  [ 2577.870299] [c00000031a89bd80] [c000000000b90d50] do_page_fault+0x350/0x7d0
  [ 2577.870435] [c00000031a89be30] [c000000000008948] 
handle_page_fault+0x10/0x30
  [ 2577.870532] Instruction dump:
  [ 2577.870578] 4bffbc19 7cb100d0 7ee4bb78 7e639b78 4800dbf9 60000000 892d023c 
2f890000 
  [ 2577.870716] 409e01a4 7c2004ac 39200000 38600001 <91329b00> 4bd99b85 
60000000 7fe3fb78 
  [ 2577.870845] ---[ end trace b2b062e289b7708f ]---
  [ 2577.873701]

  == Comment: #3 - Chandan Kumar <ckuma...@in.ibm.com> - 2016-09-27
  05:18:41 ==

  
  == Comment: #13 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-10-04 
11:51:59 ==

  
  == Comment: #14 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-10-05 
04:18:52 ==

  
  == Comment: #15 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-10-05 
05:12:41 ==

  
  == Comment: #17 - Luciano Chavez <cha...@us.ibm.com> - 2016-10-05 15:40:06 ==

  
  == Comment: #22 - Richard M. Scheller <rsche...@us.ibm.com> - 2016-10-06 
22:21:26 ==
  (In reply to comment #21)
  > Patched ubuntu kernel packages based on 4.8.0-19.21 are available here:
  > http://www.lab.toulouse-stg.fr.ibm.com/~laurent/BZ146511/
  > 
  > laurent@test1:~$ uname -v
  > #21+bz146511 SMP Thu Oct 6 16:37:38 CEST 2016
  > 
  > Please give a try.

  I have run with this patched kernel on four guests on my Ubuntu 16.10
  KVM host.  Three of my guests are NOT backed by huge pages.  The
  fourth guest is backed by huge pages.  All four of these guests have
  PCI passthrough adapters.

  All four of these guests crashed and rebooted within a few hours with
  out-of-memory errors, both with the standard Ubuntu 4.8.0-19 kernel
  and with this patched kernel.

  There are five other guests on the same host system which do not have
  PCI passthrough adapters.  None of these guests are reproducing the
  out-of-memory errors, despite running the same test suites.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1632458/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to