------- Comment From pavsu...@in.ibm.com 2019-11-06 01:44 EDT-------
root@ltc-wspoon11:~# add-apt-repository ppa:ubuntu-power-triage/lp1848127

More info: https://launchpad.net/~ubuntu-power-triage/+archive/ubuntu/lp1848127
Press [ENTER] to continue or Ctrl-c to cancel adding it.

Get:1 file:/var/cuda-repo-10-1-local-10.1.152-418.67  InRelease
Ign:1 file:/var/cuda-repo-10-1-local-10.1.152-418.67  InRelease
Get:2 file:/var/cuda-repo-10-1-local-10.1.152-418.67  Release [574 B]
Get:2 file:/var/cuda-repo-10-1-local-10.1.152-418.67  Release [574 B]
Hit:4 http://us.ports.ubuntu.com/ubuntu-ports bionic InRelease
Hit:5 http://us.ports.ubuntu.com/ubuntu-ports bionic-updates InRelease
Hit:6 http://ports.ubuntu.com/ubuntu-ports bionic-security InRelease
Ign:7 http://ddebs.ubuntu.com bionic InRelease
Hit:8 http://us.ports.ubuntu.com/ubuntu-ports bionic-backports InRelease
Ign:9 http://ddebs.ubuntu.com bionic-updates InRelease
Hit:10 http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu bionic 
InRelease
Ign:11 http://ddebs.ubuntu.com bionic-proposed InRelease
Hit:12 http://ddebs.ubuntu.com bionic Release
Hit:14 http://ddebs.ubuntu.com bionic-updates Release
Hit:16 http://ddebs.ubuntu.com bionic-proposed Release
Reading package lists... Done
root@ltc-wspoon11:~# apt-get update
Get:1 file:/var/cuda-repo-10-1-local-10.1.152-418.67  InRelease
Ign:1 file:/var/cuda-repo-10-1-local-10.1.152-418.67  InRelease
Get:2 file:/var/cuda-repo-10-1-local-10.1.152-418.67  Release [574 B]
Get:2 file:/var/cuda-repo-10-1-local-10.1.152-418.67  Release [574 B]
Hit:4 http://us.ports.ubuntu.com/ubuntu-ports bionic InRelease
Hit:5 http://us.ports.ubuntu.com/ubuntu-ports bionic-updates InRelease
Ign:6 http://ddebs.ubuntu.com bionic InRelease
Hit:7 http://ports.ubuntu.com/ubuntu-ports bionic-security InRelease
Hit:8 http://us.ports.ubuntu.com/ubuntu-ports bionic-backports InRelease
Hit:9 http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu bionic 
InRelease
Ign:10 http://ddebs.ubuntu.com bionic-updates InRelease
Ign:11 http://ddebs.ubuntu.com bionic-proposed InRelease
Hit:12 http://ddebs.ubuntu.com bionic Release
Hit:14 http://ddebs.ubuntu.com bionic-updates Release
Hit:16 http://ddebs.ubuntu.com bionic-proposed Release
Reading package lists... Done

root@ltc-wspoon11:~# apt-get install 
linux-image-unsigned-4.15.0-68-generic/bionic
Reading package lists... Done
Building dependency tree
Reading state information... Done
Selected version '4.15.0-68.77~lp1848127+build.1' (lp1848127:18.04/bionic 
[ppc64el]) for 'linux-image-unsigned-4.15.0-68-generic'
The following additional packages will be installed:
linux-modules-4.15.0-68-generic
Suggested packages:
fdutils linux-doc-4.15.0 | linux-source-4.15.0 linux-headers-4.15.0-68-generic
The following NEW packages will be installed:
linux-image-unsigned-4.15.0-68-generic linux-modules-4.15.0-68-generic
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 18.6 MB of archives.
After this operation, 92.8 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu bionic/main 
ppc64el linux-modules-4.15.0-68-generic ppc64el 4.15.0-68.77~lp1848127+build.1 
[12.1 MB]
Get:2 http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu bionic/main 
ppc64el linux-image-unsigned-4.15.0-68-generic ppc64el 
4.15.0-68.77~lp1848127+build.1 [6,532 kB]
Fetched 18.6 MB in 6s (3,277 kB/s)
Selecting previously unselected package linux-modules-4.15.0-68-generic.
(Reading database ... 183240 files and directories currently installed.)
Preparing to unpack 
.../linux-modules-4.15.0-68-generic_4.15.0-68.77~lp1848127+build.1_ppc64el.deb 
...
Unpacking linux-modules-4.15.0-68-generic (4.15.0-68.77~lp1848127+build.1) ...
Selecting previously unselected package linux-image-unsigned-4.15.0-68-generic.
Preparing to unpack 
.../linux-image-unsigned-4.15.0-68-generic_4.15.0-68.77~lp1848127+build.1_ppc64el.deb
 ...
Unpacking linux-image-unsigned-4.15.0-68-generic 
(4.15.0-68.77~lp1848127+build.1) ...
Setting up linux-modules-4.15.0-68-generic (4.15.0-68.77~lp1848127+build.1) ...
Setting up linux-image-unsigned-4.15.0-68-generic 
(4.15.0-68.77~lp1848127+build.1) ...
I: /boot/vmlinux.old is now a symlink to vmlinux-4.15.0-66-generic
I: /boot/initrd.img.old is now a symlink to initrd.img-4.15.0-66-generic
I: /boot/vmlinux is now a symlink to vmlinux-4.15.0-68-generic
I: /boot/initrd.img is now a symlink to initrd.img-4.15.0-68-generic
Processing triggers for linux-image-unsigned-4.15.0-68-generic 
(4.15.0-68.77~lp1848127+build.1) ...
/etc/kernel/postinst.d/dkms:
* dkms: running auto installation service for kernel 4.15.0-68-generic
Error! Your kernel headers for kernel 4.15.0-68-generic cannot be found.
Please install the linux-headers-4.15.0-68-generic package,
or use the --kernelsourcedir option to tell DKMS where it's located
...done.
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-4.15.0-68-generic
/etc/kernel/postinst.d/kdump-tools:
kdump-tools: Generating /var/lib/kdump/initrd.img-4.15.0-68-generic
/etc/kernel/postinst.d/zz-update-grub:
Sourcing file `/etc/default/grub'
Sourcing file `/etc/default/grub.d/kdump-tools.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinux-5.3.0-18-generic
Found initrd image: /boot/initrd.img-5.3.0-18-generic
Found linux image: /boot/vmlinux-4.15.0-68-generic
Found initrd image: /boot/initrd.img-4.15.0-68-generic
Found linux image: /boot/vmlinux-4.15.0-66-generic
Found initrd image: /boot/initrd.img-4.15.0-66-generic
Found linux image: /boot/vmlinux-4.15.0-65-generic
Found initrd image: /boot/initrd.img-4.15.0-65-generic
done

root@ltc-wspoon11:~# apt-get install linux-headers-4.15.0-68-generic
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
linux-headers-4.15.0-68
The following NEW packages will be installed:
linux-headers-4.15.0-68 linux-headers-4.15.0-68-generic
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 12.6 MB of archives.
After this operation, 86.2 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu bionic/main 
ppc64el linux-headers-4.15.0-68 all 4.15.0-68.77~lp1848127+build.1 [11.3 MB]
Get:2 http://ppa.launchpad.net/ubuntu-power-triage/lp1848127/ubuntu bionic/main 
ppc64el linux-headers-4.15.0-68-generic ppc64el 4.15.0-68.77~lp1848127+build.1 
[1,357 kB]
Fetched 12.6 MB in 6s (2,150 kB/s)
Selecting previously unselected package linux-headers-4.15.0-68.
(Reading database ... 184392 files and directories currently installed.)
Preparing to unpack 
.../linux-headers-4.15.0-68_4.15.0-68.77~lp1848127+build.1_all.deb ...
Unpacking linux-headers-4.15.0-68 (4.15.0-68.77~lp1848127+build.1) ...
Selecting previously unselected package linux-headers-4.15.0-68-generic.
Preparing to unpack 
.../linux-headers-4.15.0-68-generic_4.15.0-68.77~lp1848127+build.1_ppc64el.deb 
...
Unpacking linux-headers-4.15.0-68-generic (4.15.0-68.77~lp1848127+build.1) ...
Setting up linux-headers-4.15.0-68 (4.15.0-68.77~lp1848127+build.1) ...
Setting up linux-headers-4.15.0-68-generic (4.15.0-68.77~lp1848127+build.1) ...
/etc/kernel/header_postinst.d/dkms:
* dkms: running auto installation service for kernel 4.15.0-68-generic

Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area...
unset ARCH; env NV_VERBOSE=1 'make' -j16 NV_EXCLUDE_BUILD_MODULES='' 
KERNEL_UNAME=4.15.0-68-generic IGNORE_XEN_PRESENCE=1 IGNORE_CC_MISMATCH=1 
SYSSRC=/lib/modules/4.15.0-68-generic/build LD=/usr/bin/ld.bfd modules......
cleaning build area...

DKMS: build completed.

nvidia.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/4.15.0-68-generic/updates/dkms/

nvidia-modeset.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/4.15.0-68-generic/updates/dkms/

nvidia-drm.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/4.15.0-68-generic/updates/dkms/

nvidia-uvm.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/4.15.0-68-generic/updates/dkms/

depmod...

DKMS: install completed.
...done.

root@ltc-wspoon4:~# uname -a
Linux ltc-wspoon4 4.15.0-68-generic #77~lp1848127+build.1-Ubuntu SMP Mon Oct 28 
19:57:54 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux
root@ltc-wspoon4:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/";
SUPPORT_URL="https://help.ubuntu.com/";
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/";
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy";
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

root@ltc-wspoon4:~# ./statedisable.sh
./statedisable.sh: line 10: 
/sys/devices/system/cpu/cpu*/cpuidle/state7/disable: No such file or directory
./statedisable.sh: line 11: 
/sys/devices/system/cpu/cpu*/cpuidle/state8/disable: No such file or directory

root@ltc-wspoon4:~# ./run_workload.sh

root@ltc-wspoon4:~# ./scom_addr_p9.sh 0x1001080c 22
EQ[ 5]: 0x1501080c
EX[11]: 0x15010c0c
C[22]: 0x3601080c
root@ltc-wspoon4:~# getscom -c 0x8 0x15010c0c
0000000000000000

ltc-wspoon4 login: [  442.228985]   NIP [c00000000019ae5c]: osq_lock+0x15c/0x230
[  442.228985]   Initiator: CPU
[  442.228986]   Error type: UE [Load/Store]
[  442.228987]     Effective address: c000201cc76a9600
[  442.228988]     Physical address:  0000201cc76a0000
[  442.228988] opal: Hardware platform error: Unrecoverable Machine Check 
exception
[  442.228989] CPU: 109 PID: 9095 Comm: find Tainted: G   M              
4.15.0-68-generic #77~lp1848127+build.1-Ubuntu
[  442.228990] NIP:  c00000000019ae5c LR: c000000000e000a0 CTR: c000000000446e30
[  442.228991] REGS: c000201fff24bd70 TRAP: 0200   Tainted: G   M               
(5.0.0-33-generic)
[  442.228992] MSR:  9000000000209033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 48002222  
XER: 00000000
[  442.228996] CFAR: c00000000019ae34 DAR: c000201cc76a9600 DSISR: 00008000 
IRQMASK: 0
[  442.228998] GPR00: c000000000e000a0 c000201c87babc30 c00000000184cb00 
c000000001731abc
[  442.229001] GPR04: 0000000000000000 0000000000000000 c000000001885c78 
0000000000000000
[  442.229003] GPR08: c000201cc76a9600 c000201cc7b69600 0000000000000004 
ffffffffffffffea
[  442.229005] GPR12: 0000000088002228 c000201fff686d80 00000ed7ab1e2b80 
00000ed7ab1e2b80
[  442.229008] GPR16: 00000ed7ab1f0e30 00000ed7ab1eec30 0000000000000101 
00007fffc662d8b8
[  442.229010] GPR20: 0000000000000000 0000000000030000 000000000001a9b7 
0000000000000018
[  442.229012] GPR24: c000001fc28a9dc8 c000201c7710c500 0000000000000000 
c000000001731ab0
[  442.229014] GPR28: 0000000000000002 c000000001731abc c000201c87babdb0 
c000000001731ab0
[  442.229017] NIP [c00000000019ae5c] osq_lock+0x15c/0x230
[  442.229018] LR [c000000000e000a0] __mutex_lock.isra.1+0x90/0x710
[  442.229018] Call Trace:
[  442.229019] [c000201c87babc30] [c000000000e00054] 
__mutex_lock.isra.1+0x44/0x710 (unreliable)
[  442.229020] [c000201c87babcd0] [c000000[  577.498732581,0] OPAL: Reboot 
requested due to Platform error.
[  577.498806187,3] OPAL: Reboot requested due to Platform error.0004facd0] 
kernfs_fop_readdir+0x200/0x3b0
[  442.229022] [c000201c87babd40] [c000000000446300] iterate_dir+0x200/0x280
[  442.229023] [c000201c87babd90] [c0000000004472a0] ksys_getdents64+0xa0/0x1a0
[  442.229024] [c000201c87babe00] [c0000000004473c8] sys_getdents64+0x28/0x110
[  442.229025] [c000201c87babe20] [c00000000000b288] system_call+0x5c/0x70
[  442.229026] Instruction dump:
[  442.229027] 60000000 38e00000 48000028 60000000 60000000 81490010 7c2004ac 
2faa0000
[  442.229030] 409effd4 7c210b78 7c421378 e9090008 <e9480000> 7faa4800 409effdc 
7c0004ac
[  443.416541] Disabling lock debugging due to kernel taint
[  443.416543] Severe Machine check interrupt [Not recovered]
[  443.416544]   NIP [c00000000019ad88]: osq_lock+0x88/0x230
[  443.416544]   Initiator: CPU
[  443.416545]   Error type: UE [Load/Store]
[  443.416545]     Effective address: c000201cc76a9610
[  443.416546]     Physical address:  0000201cc76a0000
[  443.416547] opal: Hardware platform error: Unrecoverable Machine Check 
exception
[  443.416548] CPU: 90 PID: 9020 Comm: find Tainted: G   M           
4.15.0-68-generic #77~lp1848127+build.1-Ubuntu
[  443.416549] NIP:  c00000000019ad88 LR: c000000000e000a0 CTR: c0000000004f8d60
[  443.416550] REGS: c000201fff32fd70 TRAP: 0200   Tainted: G   M               
(5.0.0-33-generic)
[  443.416551] MSR:  9000000000209033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002224  
XER: 00000000
[  443.416555] CFAR: c000000000e0009c DAR: 0000201cc76a9610 DSISR: 00008000 
IRQMASK: 0
[  443.416557] GPR00: c000000000e000a0 c000201c8f81fbc0 c00000000184cb00 
c000000001731abc
[  443.416559] GPR04: 0000000000000000 0000000000000000 0000201cc6370000 
c000000001339600
[  443.416561] GPR08: c000001ffec29600 c000201cc76a9600 0000001ffd8f0000 
c000001f96936300
[  443.416564] GPR12: 0000000084002228 c000201fff69ba00 00000334527e2b80 
0000000000000000
[  443.416566] GPR16: 0000000000000000 000003345280d440 0000000000000101 
00007fffcaffe858
[  443.416568] GPR20: 0000000000000000 00007fffcaffe7c8 0000000000000000 
0000000000000006
[  443.416570] GPR24: 000077194c155308 00000000000007ff c000201c8f81fd80 
000003345280d548
[  443.416572] GPR28: 0000000000000002 c000000001731abc c000000001731ab0 
c000000001731ab0
[  443.416575] NIP [c00000000019ad88] osq_lock+0x88/0x230
[  443.416576] LR [c000000000e000a0] __mutex_lock.isra.1+0x90/0x710
[  443.416576] Call Trace:
[  443.416577] [c000201c8f81fbc0] [c000000000e00054] 
__mutex_lock.isra.1+0x44/0x710 (unreliable)
[  443.416578] [c000201c8f81fc60] [c0000000004f8dac] 
kernfs_iop_getattr+0x4c/0xa0
[  443.416579] [c000201c8f81fca0] [c00000000042eac0] vfs_getattr_nosec+0x90/0xf0
[  443.416581] [c000201c8f81fce0] [c00000000042ed68] vfs_statx+0xc8/0x190
[  443.416582] [c000201c8f81fd60] [c00000000042f128] sys_newfstatat+0x48/0x90
[  443.416583] [c000201c8f81fe20] [c00000000000b288] system_call+0x5c/0x70
[  443.416584] Instruction dump:
[  443.416584] 2faa0000 419e00c4 394affff 3d020003 39085170 7d4a07b4 794a1f24 
7d48502a
[  443.416587] 7d075214 f9090008 7c2004ac 7d27512a <81490010> 2faa0000 409e0090 
782a0464
[  577.500377001,3]  ___________________________________________________________
[  577.500429242,3] <  Dangerous NVRAM option: opal-sw-xstop=enable
[  577.500480635,3]  -----------------------------------------------------------
[  577.500520165,3]                   \
[  577.500562271,3]                    \   WW
[  577.500614905,3]                       <^ \___/|
[  577.500657283,3]                        \      /
[  577.500704560,3]                         \_  _/
[  577.500743890,3]                           }{

The Linux HOST did not hang and it booted back after the above
injection.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1848127

Title:
  [LTCTest][OPAL][OP930] Machine hangs after injecting the Machine Check
  Error

Status in The Ubuntu-power-systems project:
  In Progress
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Disco:
  In Progress
Status in linux source package in Eoan:
  In Progress

Bug description:
  [IMPACT]
  MCE test renders the system unresponsive on P9 open power hardware 
(Withersoon)

  [TEST]
  A test kernel is available in ppa:ubuntu-power-triage/lp1848127. Please see 
the [OTHER] section for test details and comment #7 for results with the PPA 
kernel. 

  [FIX]
  IBM has identified the following patch that fixes this issue:
  commit 99ead78afd1128bfcebe7f88f3b102fb2da09aee
  Author: Balbir Singh <bsinghar...@gmail.com>
  Date:   Tue Aug 20 13:43:47 2019 +0530

      powerpc/mce: Fix MCE handling for huge pages

  [REGRESSION POTENTIAL]
  The patch is applicable the powerpc architecture and limited in scope to MCE 
handling for huge pages. Patch does not touch any generic code. Regression if 
any is limited to powerpc MCE handling.

  [OTHER]
  == Comment: #0 - PAVAMAN SUBRAMANIYAM <pavsu...@in.ibm.com> - 2019-05-07 
23:31:20 ==
  Install a P9 Open Power Hardware with the latest OP930 Firmware images built  
from the upstream op-build git tree.

  root@witherspoon:~# cat /etc/os-release
  ID="openbmc-phosphor"
  NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro)"
  VERSION="ibm-v2.3"
  VERSION_ID="ibm-v2.3-476-g2d622cb-r32-0-g9973ab0"
  PRETTY_NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) 
ibm-v2.3"
  BUILD_ID="ibm-v2.3-476-g2d622cb-r32"
  root@witherspoon:~# cat /var/lib/phosphor-software-manager/pnor/ro/VERSION
   open-power-witherspoon-v2.3-rc2-58-g59fd0743
          buildroot-2019.02.2-17-g93b841d204
          skiboot-v6.3-rc2
          hostboot-19a436e
          occ-58e422d
          linux-5.0.9-openpower1-p3a4d5a4
          petitboot-v1.10.3
          machine-xml-a6f4df3
          hostboot-binaries-hw043019a.940
          capp-ucode-p9-dd2-v4
          sbe-249671d
          hcode-hw040319a.940

  Then enable sw xstop manually by using below command:

  root@ltc-wspoon11:~# nvram -p ibm,skiboot --update-config opal-sw-xstop=enable
  root@ltc-wspoon11:~# nvram -p ibm,skiboot --print-config
  "ibm,skiboot" Partition
  --------------------------
  experimental-fast-reset=1
  snarf-mode=noooooo
  opal-sw-xstop=enable

  Then from the Linux HOST injected the MCE UE Error on the machine as
  follows:

  root@ltc-wspoon11:~# ./probe_cpus.sh -L
  CHIP ID: 0 CORE ID: 0 THREADS: 4 CPUs:  0 1 2 3
  CHIP ID: 0 CORE ID: 1 THREADS: 4 CPUs:  4 5 6 7
  CHIP ID: 0 CORE ID: 2 THREADS: 4 CPUs:  8 9 10 11
  CHIP ID: 0 CORE ID: 3 THREADS: 4 CPUs:  12 13 14 15
  CHIP ID: 0 CORE ID: 6 THREADS: 4 CPUs:  16 17 18 19
  CHIP ID: 0 CORE ID: 7 THREADS: 4 CPUs:  20 21 22 23
  CHIP ID: 0 CORE ID: 8 THREADS: 4 CPUs:  24 25 26 27
  CHIP ID: 0 CORE ID: 9 THREADS: 4 CPUs:  28 29 30 31
  CHIP ID: 0 CORE ID: 10 THREADS: 4 CPUs:  32 33 34 35
  CHIP ID: 0 CORE ID: 11 THREADS: 4 CPUs:  36 37 38 39
  CHIP ID: 0 CORE ID: 12 THREADS: 4 CPUs:  40 41 42 43
  CHIP ID: 0 CORE ID: 13 THREADS: 4 CPUs:  44 45 46 47
  CHIP ID: 0 CORE ID: 16 THREADS: 4 CPUs:  48 49 50 51
  CHIP ID: 0 CORE ID: 17 THREADS: 4 CPUs:  52 53 54 55
  CHIP ID: 0 CORE ID: 18 THREADS: 4 CPUs:  56 57 58 59
  CHIP ID: 0 CORE ID: 19 THREADS: 4 CPUs:  60 61 62 63
  CHIP ID: 0 CORE ID: 20 THREADS: 4 CPUs:  64 65 66 67
  CHIP ID: 0 CORE ID: 21 THREADS: 4 CPUs:  68 69 70 71
  CHIP ID: 8 CORE ID: 6 THREADS: 4 CPUs:  72 73 74 75
  CHIP ID: 8 CORE ID: 7 THREADS: 4 CPUs:  76 77 78 79
  CHIP ID: 8 CORE ID: 8 THREADS: 4 CPUs:  80 81 82 83
  CHIP ID: 8 CORE ID: 9 THREADS: 4 CPUs:  84 85 86 87
  CHIP ID: 8 CORE ID: 10 THREADS: 4 CPUs:  88 89 90 91
  CHIP ID: 8 CORE ID: 11 THREADS: 4 CPUs:  92 93 94 95
  CHIP ID: 8 CORE ID: 12 THREADS: 4 CPUs:  96 97 98 99
  CHIP ID: 8 CORE ID: 13 THREADS: 4 CPUs:  100 101 102 103
  CHIP ID: 8 CORE ID: 14 THREADS: 4 CPUs:  104 105 106 107
  CHIP ID: 8 CORE ID: 15 THREADS: 4 CPUs:  108 109 110 111
  CHIP ID: 8 CORE ID: 16 THREADS: 4 CPUs:  112 113 114 115
  CHIP ID: 8 CORE ID: 17 THREADS: 4 CPUs:  116 117 118 119
  CHIP ID: 8 CORE ID: 18 THREADS: 4 CPUs:  120 121 122 123
  CHIP ID: 8 CORE ID: 19 THREADS: 4 CPUs:  124 125 126 127
  CHIP ID: 8 CORE ID: 20 THREADS: 4 CPUs:  128 129 130 131
  CHIP ID: 8 CORE ID: 21 THREADS: 4 CPUs:  132 133 134 135
  CHIP ID: 8 CORE ID: 22 THREADS: 4 CPUs:  136 137 138 139
  CHIP ID: 8 CORE ID: 23 THREADS: 4 CPUs:  140 141 142 143

  -----------------------------
  p[0]
     eq[0,1,2,3,4,5]
     ex[0,1,3,4,5,6,8,9,10]
      c[0,1,2,3,6,7,8,9,10,11,12,13,16,17,18,19,20,21]
  p[8]
     eq[1,2,3,4,5]
     ex[3,4,5,6,7,8,9,10,11]
      c[6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]
  -----------------------------

  ----------Processor Layout-------------------
  p[0]
          +---EQ00----+   +---EQ02----+   +---EQ04----+
          |EX-0    C0 |   |EX-4    C8 |   |EX-8    C16|
          + - - - - - +   + - - - - - +   + - - - - - +
          |EX-0    C1 |   |EX-4    C9 |   |EX-8    C17|
          + - - - - - +   + - - - - - +   + - - - - - +
          |EX-1    C2 |   |EX-5    C10|   |EX-9    C18|
          + - - - - - +   + - - - - - +   + - - - - - +
          |EX-1    C3 |   |EX-5    C11|   |EX-9    C19|
          +-----------+   +-----------+   +-----------+

          +---EQ01----+   +---EQ03----+   +---EQ05----+
          |           |   |EX-6    C12|   |EX-10   C20|
          + - - - - - +   + - - - - - +   + - - - - - +
          |           |   |EX-6    C13|   |EX-10   C21|
          + - - - - - +   + - - - - - +   + - - - - - +
          |EX-3    C6 |   |           |   |           |
          + - - - - - +   + - - - - - +   + - - - - - +
          |EX-3    C7 |   |           |   |           |
          +-----------+   +-----------+   +-----------+

  p[8]
          +---EQ00----+   +---EQ02----+   +---EQ04----+
          |           |   |EX-4    C8 |   |EX-8    C16|
          + - - - - - +   + - - - - - +   + - - - - - +
          |           |   |EX-4    C9 |   |EX-8    C17|
          + - - - - - +   + - - - - - +   + - - - - - +
          |           |   |EX-5    C10|   |EX-9    C18|
          + - - - - - +   + - - - - - +   + - - - - - +
          |           |   |EX-5    C11|   |EX-9    C19|
          +-----------+   +-----------+   +-----------+

          +---EQ01----+   +---EQ03----+   +---EQ05----+
          |           |   |EX-6    C12|   |EX-10   C20|
          + - - - - - +   + - - - - - +   + - - - - - +
          |           |   |EX-6    C13|   |EX-10   C21|
          + - - - - - +   + - - - - - +   + - - - - - +
          |EX-3    C6 |   |EX-7    C14|   |EX-11   C22|
          + - - - - - +   + - - - - - +   + - - - - - +
          |EX-3    C7 |   |EX-7    C15|   |EX-11   C23|
          +-----------+   +-----------+   +-----------+

  root@ltc-wspoon11:~# ./statedisable.sh
  ./statedisable.sh: line 10: 
/sys/devices/system/cpu/cpu*/cpuidle/state7/disable: No such file or directory
  ./statedisable.sh: line 11: 
/sys/devices/system/cpu/cpu*/cpuidle/state8/disable: No such file or directory

  root@ltc-wspoon11:~# cpupower idle-info
  CPUidle driver: powernv_idle
  CPUidle governor: menu
  analyzing CPU 0:

  Number of idle states: 7
  Available idle states: snooze stop0_lite stop0 stop1 stop2 stop4 stop5
  snooze (DISABLED) :
  Flags/Description: snooze
  Latency: 0
  Usage: 81861
  Duration: 29748269
  stop0_lite (DISABLED) :
  Flags/Description: stop0_lite
  Latency: 1
  Usage: 70
  Duration: 1982345
  stop0 (DISABLED) :
  Flags/Description: stop0
  Latency: 2
  Usage: 274
  Duration: 125896
  stop1 (DISABLED) :
  Flags/Description: stop1
  Latency: 5
  Usage: 36
  Duration: 4922
  stop2 (DISABLED) :
  Flags/Description: stop2
  Latency: 10
  Usage: 3745
  Duration: 88300041
  stop4 (DISABLED) :
  Flags/Description: stop4
  Latency: 100
  Usage: 65
  Duration: 1048951
  stop5 (DISABLED) :
  Flags/Description: stop5
  Latency: 200
  Usage: 30377
  Duration: 61977191643

  root@ltc-wspoon11:~#./run_workload.sh

  root@ltc-wspoon11:~# ./scom_addr_p9.sh 0x1001080c 15
  EQ[ 3]: 0x1301080c
  EX[ 7]: 0x13010c0c
   C[15]: 0x3f01080c
  root@ltc-wspoon11:~# ./skiboot/external/xscom-utils/getscom -c 0x8 0x13010c0c
  0000000000000000
  root@ltc-wspoon11:~# ./skiboot/external/xscom-utils/putscom -c 0x8 0x13010c0c 
0c00000000000000
  0c00000000000000
  root@ltc-wspoon11:~# ./skiboot/external/xscom-utils/putscom -c 0x8 0x13010c0c 
0c00000000000000
  0c00000000000000

  After injecting the Machine check error, the HOST Linux stops pinging
  and the console access to the machine also gets lost.

  But still the Open BMC shell and GUI still shows that the HOST is in
  Running state.

  == Comment: #1 - PAVAMAN SUBRAMANIYAM <pavsu...@in.ibm.com> - 2019-05-07 
23:33:31 ==
  The machine is installed with the Ubuntu 18.04 Linux OS.

  root@ltc-wspoon11:~# uname -a
  Linux ltc-wspoon11 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:26:19 UTC 
2019 ppc64le ppc64le ppc64le GNU/Linux
  root@ltc-wspoon11:~# cat /etc/os-release
  NAME="Ubuntu"
  VERSION="18.04.2 LTS (Bionic Beaver)"
  ID=ubuntu
  ID_LIKE=debian
  PRETTY_NAME="Ubuntu 18.04.2 LTS"
  VERSION_ID="18.04"
  HOME_URL="https://www.ubuntu.com/";
  SUPPORT_URL="https://help.ubuntu.com/";
  BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/";
  
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy";
  VERSION_CODENAME=bionic
  UBUNTU_CODENAME=bionic
  root@ltc-wspoon11:~# cat /proc/cpuinfo | tail
  cpu             : POWER9, altivec supported
  clock           : 2300.000000MHz
  revision        : 2.3 (pvr 004e 1203)

  timebase        : 512000000
  platform        : PowerNV
  model           : 8335-GTH
  machine         : PowerNV 8335-GTH
  firmware        : OPAL
  MMU             : Radix

  root@ltc-wspoon11:~# lsmcode
  Version of System Firmware :
   Product Name          : OpenPOWER Firmware
   Product Version       : witherspoon-v2.3-rc2-58-g59fd0743
   Product Extra         :        skiboot-v6.3-rc2
   Product Extra         :        bmc-firmware-version-2.03
   Product Extra         :        occ-58e422d
   Product Extra         :        hostboot-19a436e
   Product Extra         :        buildroot-2019.02.2-17-g93b841d204
   Product Extra         :        capp-ucode-p9-dd2-v4
   Product Extra         :        machine-xml-a6f4df3
   Product Extra         :        hostboot-binaries-hw043019a.940
   Product Extra         :        sbe-249671d
   Product Extra         :        hcode-hw040319a.940
   Product Extra         :        petitboot-v1.10.3
   Product Extra         :        linux-5.0.9-openpower1-p3a4d5a4

  == Comment: #3 - PAVAMAN SUBRAMANIYAM <pavsu...@in.ibm.com> -
  2019-05-07 23:42:35 ==

  I quickly tested MCE on op930 build ( IBM-witherspoon-ibm-
  OP9-v2.2-3.5) with  4.15.0-47-generic and found no hang. But on
  further investigation I see that the hang issue is seen from kernel
  version  4.15.0-48-generic and above.  Looks like changes that gone in
  4.15.0-48-generic version causing the hang issue. Still
  investigating....

  == Comment: #9 - Application Cdeadmin <cdead...@us.ibm.com> - 2019-05-22 
06:45:07 ==
  ==== State: Working by: jayeshp on 22 May 2019 06:37:27 ====

  Any update?

  == Comment: #11 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 
2019-09-19 04:44:01 ==
  The hang issues should go away with below patch.

  commit 99ead78afd1128bfcebe7f88f3b102fb2da09aee
  Author: Balbir Singh <bsinghar...@gmail.com>
  Date:   Tue Aug 20 13:43:47 2019 +0530

      powerpc/mce: Fix MCE handling for huge pages

      The current code would fail on huge pages addresses, since the shift would
      be incorrect. Use the correct page shift value returned by
      __find_linux_pte() to get the correct physical address. The code is more
      generic and can handle both regular and compound pages.

      Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors")
      Signed-off-by: Balbir Singh <bsinghar...@gmail.com>
      [ar...@linux.ibm.com: Fixup pseries_do_memory_failure()]
      Signed-off-by: Reza Arbab <ar...@linux.ibm.com>
      Tested-by: Mahesh Salgaonkar <mah...@linux.vnet.ibm.com>
      Signed-off-by: Santosh Sivaraj <sant...@fossix.org>
      Cc: sta...@vger.kernel.org # v4.15+
      Signed-off-by: Michael Ellerman <m...@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190820081352.8641-3-sant...@fossix.org

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1848127/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to