** Description changed:

  Tested with 5 attempts, 4 hangs around the following test in 
ubuntu_kernel_selftests net sub-category:
   # selftests: net: reuseport_bpf_cpu
  
  First attempt:
  23:21:32 DEBUG| [stdout] ok 2 selftests: net: reuseport_bpf_cpu
  23:21:32 DEBUG| [stdout] # selftests: net: reuseport_bpf_numa
  23:21:32 DEBUG| [stdout] # ---- IPv4 UDP ----
  (hang here)
  
  Second attempt:
  10:17:35 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  10:17:35 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  10:17:35 DEBUG| [stdout] # ---- IPv4 UDP ----
  10:17:35 DEBUG| [stdout] # send cpu 0, receive socket 0
  (line skipped)
  10:17:35 DEBUG| [stdout] # send cpu 159, receive socket 159
  10:17:35 DEBUG| [stdout] # ---- IPv6 TCP ----
  (hang here)
  
  Third attempt failed because of test timeout:
  12:46:16 DEBUG| [stdout] # [FAIL]
  12:46:16 DEBUG| [stdout] # --------------------
  12:46:16 DEBUG| [stdout] # running psock_tpacket test
  12:46:16 DEBUG| [stdout] # --------------------
  13:14:13 INFO | Timer expired (1800 sec.), nuking pid 161853
  
  Fourth attempt:
  07:41:51 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  07:41:51 DEBUG| [stdout] # ---- IPv4 UDP ----
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] # ---- IPv6 UDP ----
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  07:41:51 DEBUG| [stdout] # send cpu 1, receive socket 1
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 157, receive socket 157
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] # ---- IPv4 TCP ----
  (test hang here)
  
  Fifth attempt:
  04:29:17 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  04:29:17 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  04:29:17 DEBUG| [stdout] # ---- IPv4 UDP ----
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] # ---- IPv6 UDP ----
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] # ---- IPv4 TCP ----
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 15, receive socket 15
  (test hang here)
  
  I tried to run tests in this sru-misc suite in the following order:
-         'hwclock',
-         'ubuntu_bpf',
-         'ubuntu_bpf_jit',
-         'ubuntu_kernel_selftests',
-         'ubuntu_lxc',
-         'ubuntu_seccomp',
-         'ubuntu_unionmount_ovlfs',
-         'ubuntu_cts_kernel',
-         'ubuntu_kvm_unit_tests',
+         'hwclock',
+         'ubuntu_bpf',
+         'ubuntu_bpf_jit',
+         'ubuntu_kernel_selftests',
+         'ubuntu_lxc',
+         'ubuntu_seccomp',
+         'ubuntu_unionmount_ovlfs',
+         'ubuntu_cts_kernel',
+         'ubuntu_kvm_unit_tests',
  One by one on this node, but I can't reproduce this issue.
  
  I tried to watch dmesg when this happens, but there is no information
  there, the system will be reboot automatically silently.
+ 
+ This is what you can see from syslog after reboot:
+ Mar 12 04:27:39 modoc kernel: [  536.668305] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.684547] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.700907] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.717246] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.719288] page:c00c000000c4f000 refcount:1 
mapcount:0 mapping:c000000f8cfe0fd1 index:0x7611c3e
+ Mar 12 04:27:39 modoc kernel: [  536.719289] anon
+ Mar 12 04:27:39 modoc kernel: [  536.719291] flags: 
0x3ffff800080024(uptodate|active|swapbacked)
+ Mar 12 04:27:39 modoc kernel: [  536.719294] raw: 003ffff800080024 
5deadbeef0000100 5deadbeef0000122 c000000f8cfe0fd1
+ Mar 12 04:27:39 modoc kernel: [  536.719295] raw: 0000000007611c3e 
0000000000000000 00000001ffffffff c000000fcfd1c000
+ Mar 12 04:27:39 modoc kernel: [  536.719296] page dumped because: unmovable 
page
+ Mar 12 04:27:39 modoc kernel: [  536.719296] page->mem_cgroup:c000000fcfd1c000
+ Mar 12 04:27:39 modoc kernel: [  536.735465] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.751848] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.768210] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.784450] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.800756] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.817006] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.833133] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.849205] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.865448] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ 
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Mar
 12 04:35:41 modoc systemd[1]: Starting Flush Journal to Persistent Storage...
+ Mar 12 04:35:41 modoc kernel: [    0.000000] hash-mmu: Page sizes from 
device-tree:
+ Mar 12 04:35:41 modoc kernel: [    0.000000] hash-mmu: base_shift=12: 
shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
+ Mar 12 04:35:41 modoc kernel: [    0.000000] hash-mmu: base_shift=12: 
shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7
+ Mar 12 04:35:41 modoc kernel: [    0.000000] hash-mmu: base_shift=12: 
shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56
+ Mar 12 04:35:41 modoc systemd[1]: Started udev Kernel Device Manager.
+ 
+ From the log above, line "^@^@^@^@^@^" indicates the reboot. It looks
+ like it's running the memory-hotplug test.
  
  Maybe we need to use IPMI to see if there is anything on the console.
  
  ProblemType: Bug
  DistroRelease: Ubuntu 19.10
  Package: linux-image-5.3.0-42-generic 5.3.0-42.34
  ProcVersionSignature: Ubuntu 5.3.0-42.34-generic 5.3.18
  Uname: Linux 5.3.0-42-generic ppc64le
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Mar 12 04:33 seq
   crw-rw---- 1 root audio 116, 33 Mar 12 04:33 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.11-0ubuntu8.5
  Architecture: ppc64el
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Thu Mar 12 09:42:24 2020
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  PciMultimedia:
  
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:
  
  ProcKernelCmdLine: root=UUID=b2a867ce-7813-4785-8861-4e7de2ac39b4 ro 
console=hvc0
  ProcLoadAvg: 0.07 0.02 0.00 1/1461 86637
  ProcLocks:
   1: POSIX  ADVISORY  WRITE 3799 00:18:841 0 EOF
   2: POSIX  ADVISORY  WRITE 3526 00:18:743 0 EOF
   3: FLOCK  ADVISORY  WRITE 3720 00:18:837 0 EOF
  ProcSwaps:
   Filename                             Type            Size    Used    Priority
   /swap.img                               file         8388544 0       -2
  ProcVersion: Linux version 5.3.0-42-generic (buildd@bos02-ppc64el-006) (gcc 
version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2)) #34-Ubuntu SMP Fri Feb 28 
05:49:17 UTC 2020
  RelatedPackageVersions:
   linux-restricted-modules-5.3.0-42-generic N/A
   linux-backports-modules-5.3.0-42-generic  N/A
   linux-firmware                            1.183.4
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  VarLogDump_list: total 0
  cpu_cores: Number of cores present = 20
  cpu_coreson: Number of cores online = 20
  cpu_dscr: DSCR is 0
  cpu_freq:
   min: 3.694 GHz (cpu 159)
   max: 3.695 GHz (cpu 1)
   avg: 3.694 GHz
  cpu_runmode:
   Could not retrieve current diagnostics mode,
   No kernel interface to firmware
  cpu_smt: SMT=8

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1867155

Title:
  P8 node modoc will reboot automatically when running the sru_misc test
  suite

Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Tested with 5 attempts, 4 hangs around the following test in 
ubuntu_kernel_selftests net sub-category:
   # selftests: net: reuseport_bpf_cpu

  First attempt:
  23:21:32 DEBUG| [stdout] ok 2 selftests: net: reuseport_bpf_cpu
  23:21:32 DEBUG| [stdout] # selftests: net: reuseport_bpf_numa
  23:21:32 DEBUG| [stdout] # ---- IPv4 UDP ----
  (hang here)

  Second attempt:
  10:17:35 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  10:17:35 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  10:17:35 DEBUG| [stdout] # ---- IPv4 UDP ----
  10:17:35 DEBUG| [stdout] # send cpu 0, receive socket 0
  (line skipped)
  10:17:35 DEBUG| [stdout] # send cpu 159, receive socket 159
  10:17:35 DEBUG| [stdout] # ---- IPv6 TCP ----
  (hang here)

  Third attempt failed because of test timeout:
  12:46:16 DEBUG| [stdout] # [FAIL]
  12:46:16 DEBUG| [stdout] # --------------------
  12:46:16 DEBUG| [stdout] # running psock_tpacket test
  12:46:16 DEBUG| [stdout] # --------------------
  13:14:13 INFO | Timer expired (1800 sec.), nuking pid 161853

  Fourth attempt:
  07:41:51 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  07:41:51 DEBUG| [stdout] # ---- IPv4 UDP ----
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] # ---- IPv6 UDP ----
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  07:41:51 DEBUG| [stdout] # send cpu 1, receive socket 1
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 157, receive socket 157
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] # ---- IPv4 TCP ----
  (test hang here)

  Fifth attempt:
  04:29:17 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  04:29:17 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  04:29:17 DEBUG| [stdout] # ---- IPv4 UDP ----
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] # ---- IPv6 UDP ----
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] # ---- IPv4 TCP ----
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 15, receive socket 15
  (test hang here)

  I tried to run tests in this sru-misc suite in the following order:
          'hwclock',
          'ubuntu_bpf',
          'ubuntu_bpf_jit',
          'ubuntu_kernel_selftests',
          'ubuntu_lxc',
          'ubuntu_seccomp',
          'ubuntu_unionmount_ovlfs',
          'ubuntu_cts_kernel',
          'ubuntu_kvm_unit_tests',
  One by one on this node, but I can't reproduce this issue.

  I tried to watch dmesg when this happens, but there is no information
  there, the system will be reboot automatically silently.

  This is what you can see from syslog after reboot:
  Mar 12 04:27:39 modoc kernel: [  536.668305] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.684547] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.700907] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.717246] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.719288] page:c00c000000c4f000 refcount:1 
mapcount:0 mapping:c000000f8cfe0fd1 index:0x7611c3e
  Mar 12 04:27:39 modoc kernel: [  536.719289] anon
  Mar 12 04:27:39 modoc kernel: [  536.719291] flags: 
0x3ffff800080024(uptodate|active|swapbacked)
  Mar 12 04:27:39 modoc kernel: [  536.719294] raw: 003ffff800080024 
5deadbeef0000100 5deadbeef0000122 c000000f8cfe0fd1
  Mar 12 04:27:39 modoc kernel: [  536.719295] raw: 0000000007611c3e 
0000000000000000 00000001ffffffff c000000fcfd1c000
  Mar 12 04:27:39 modoc kernel: [  536.719296] page dumped because: unmovable 
page
  Mar 12 04:27:39 modoc kernel: [  536.719296] page->mem_cgroup:c000000fcfd1c000
  Mar 12 04:27:39 modoc kernel: [  536.735465] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.751848] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.768210] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.784450] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.800756] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.817006] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.833133] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.849205] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.865448] Injecting error (-12) to 
MEM_GOING_OFFLINE
  
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Mar
 12 04:35:41 modoc systemd[1]: Starting Flush Journal to Persistent Storage...
  Mar 12 04:35:41 modoc kernel: [    0.000000] hash-mmu: Page sizes from 
device-tree:
  Mar 12 04:35:41 modoc kernel: [    0.000000] hash-mmu: base_shift=12: 
shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
  Mar 12 04:35:41 modoc kernel: [    0.000000] hash-mmu: base_shift=12: 
shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7
  Mar 12 04:35:41 modoc kernel: [    0.000000] hash-mmu: base_shift=12: 
shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56
  Mar 12 04:35:41 modoc systemd[1]: Started udev Kernel Device Manager.

  From the log above, line "^@^@^@^@^@^" indicates the reboot. It looks
  like it's running the memory-hotplug test.

  Maybe we need to use IPMI to see if there is anything on the console.

  ProblemType: Bug
  DistroRelease: Ubuntu 19.10
  Package: linux-image-5.3.0-42-generic 5.3.0-42.34
  ProcVersionSignature: Ubuntu 5.3.0-42.34-generic 5.3.18
  Uname: Linux 5.3.0-42-generic ppc64le
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Mar 12 04:33 seq
   crw-rw---- 1 root audio 116, 33 Mar 12 04:33 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.11-0ubuntu8.5
  Architecture: ppc64el
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Thu Mar 12 09:42:24 2020
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  PciMultimedia:

  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:

  ProcKernelCmdLine: root=UUID=b2a867ce-7813-4785-8861-4e7de2ac39b4 ro 
console=hvc0
  ProcLoadAvg: 0.07 0.02 0.00 1/1461 86637
  ProcLocks:
   1: POSIX  ADVISORY  WRITE 3799 00:18:841 0 EOF
   2: POSIX  ADVISORY  WRITE 3526 00:18:743 0 EOF
   3: FLOCK  ADVISORY  WRITE 3720 00:18:837 0 EOF
  ProcSwaps:
   Filename                             Type            Size    Used    Priority
   /swap.img                               file         8388544 0       -2
  ProcVersion: Linux version 5.3.0-42-generic (buildd@bos02-ppc64el-006) (gcc 
version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2)) #34-Ubuntu SMP Fri Feb 28 
05:49:17 UTC 2020
  RelatedPackageVersions:
   linux-restricted-modules-5.3.0-42-generic N/A
   linux-backports-modules-5.3.0-42-generic  N/A
   linux-firmware                            1.183.4
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  VarLogDump_list: total 0
  cpu_cores: Number of cores present = 20
  cpu_coreson: Number of cores online = 20
  cpu_dscr: DSCR is 0
  cpu_freq:
   min: 3.694 GHz (cpu 159)
   max: 3.695 GHz (cpu 1)
   avg: 3.694 GHz
  cpu_runmode:
   Could not retrieve current diagnostics mode,
   No kernel interface to firmware
  cpu_smt: SMT=8

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1867155/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to