** Tags added: kernel-da-key

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1683699

Title:
  [LTCTest][Opal][FW860] Oops: Kernel access of bad area, sig: 11 [#1]
  during frozen PE EEH error injection.

Status in linux package in Ubuntu:
  New

Bug description:
  == Comment: #0 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2016-08-13 
08:28:54 ==
  ---Problem Description---
  Install P8 PowerNV 8284-22A Hardware with latest FW860 firmware having build 
SV860_028, and install a ubuntu 16.10 on top of it. During EEH FrozenPE error 
injection, observed a "Oops: Kernel access of bad area, sig: 11 [#1]"
   
  Contact Information = ppaid...@in.ibm.com 
   
  ---uname output---
  Linux lep8b 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:04:07 UTC 2016 
ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = PowerNV 8284-22A 
   
  ---System Hang---
   system is hung and need to do a Hard Power OFF/ON to bring the system up 
again.
   
  ---Debugger---
  A debugger is not configured
   
  ---Steps to Reproduce---
   1. Install a FW860 SV860_028 level of firmware on a P8 PowerNV 8284-22A 
Hardware.
  2. Install a ubuntu 16.10 on top of it.
  3. Inject below frozenPE EEH Error.
  echo 0:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0004/err_injct && lspci -ns 
0004:00:00.0; echo $?
  4. Immediately we can observe a kernel Oops.

   
  *Additional Instructions for ppaid...@in.ibm.com: 
  -Post a private note with access information to the machine that the bug is 
occuring on.


  Call Traces:
  root@lep8b:~# echo 0:0:4:0:0 > /sys/kernel/debug/powerpc/PCI0004/err_injct && 
lspci -ns 0004:00:00.0; echo $?
  [  271.110859] EEH: Frozen PE#0 on PHB#4 detected
  [  271.110967] EEH: PE location: N/A, PHB location: N/A
  0004:00:00.0 0604: 1014:03dc
  0
  root@lep8b:~# [  277.108098] Unable to handle kernel paging request for data 
at address 0x00000010
  [  277.108183] Faulting instruction address: 0xc000000000083c7c
  [  277.108198] Oops: Kernel access of bad area, sig: 11 [#1]
  [  277.108253] SMP NR_CPUS=2048 NUMA PowerNV
  [  277.108310] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE 
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 
nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp 
bridge stp llc kvm_hv kvm_pr kvm ebtable_filter ebtables ip6table_filter 
ip6_tables iptable_filter ip_tables x_tables leds_powernv ibmpowernv 
powernv_rng ipmi_powernv uio_pdrv_genirq ipmi_msghandler uio ib_iser rdma_cm 
iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 
multipath linear ses enclosure be2net lpfc vxlan ip6_udp_tunnel udp_tunnel 
scsi_transport_fc ipr
  [  277.109391] CPU: 9 PID: 973 Comm: eehd Not tainted 4.4.0-34-generic 
#53-Ubuntu
  [  277.109467] task: c000000feb3c2a20 ti: c000000feb408000 task.ti: 
c000000feb408000
  [  277.109542] NIP: c000000000083c7c LR: c000000000083c78 CTR: 
c000000000083c20
  [  277.109617] REGS: c000000feb40b760 TRAP: 0300   Not tainted  
(4.4.0-34-generic)
  [  277.109691] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28008822  
XER: 00000000
  [  277.109880] CFAR: c000000000008468 DAR: 0000000000000010 DSISR: 40000000 
SOFTE: 1 
  GPR00: c000000000083c78 c000000feb40b9e0 c0000000015b5d00 0000000000000000 
  GPR04: 0000000000000001 c000000feb40bac0 c000002d74b54220 0000000000000f9f 
  GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000026 
  GPR12: c000000000083c20 c000000007b45580 c0000000000e63d8 c000002d74c40100 
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
  GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000d42468 
  GPR24: c000000000d42440 0000000000000100 c000000000036460 0000000000000000 
  GPR28: c00000000161a3f0 0000000000000001 c000002ffff81000 c0000000fe440000 
  [  277.110878] NIP [c000000000083c7c] pnv_eeh_reset+0x5c/0x170
  [  277.110931] LR [c000000000083c78] pnv_eeh_reset+0x58/0x170
  [  277.110981] Call Trace:
  [  277.111009] [c000000feb40b9e0] [c000000000083c78] pnv_eeh_reset+0x58/0x170 
(unreliable)
  [  277.111098] [c000000feb40ba60] [c000000000038250] eeh_reset_pe+0xb0/0x1c0
  [  277.111175] [c000000feb40bb00] [c000000000af472c] 
eeh_reset_device+0xd8/0x228
  [  277.111255] [c000000feb40bba0] [c00000000003c4c0] 
eeh_handle_normal_event+0x390/0x440
  [  277.111429] [c000000feb40bc20] [c00000000003c964] 
eeh_handle_event+0x184/0x370
  [  277.111601] [c000000feb40bcd0] [c00000000003cd28] 
eeh_event_handler+0x1d8/0x1e0
  [  277.111772] [c000000feb40bd80] [c0000000000e64e0] kthread+0x110/0x130
  [  277.111910] [c000000feb40be30] [c000000000009538] 
ret_from_kernel_thread+0x5c/0xa4
  [  277.112068] Instruction dump:
  [  277.112143] 60000000 813f0000 ebdf0010 792affe3 408200d4 e95e0250 812a000c 
2f890002 
  [  277.112385] 419e0054 7fe3fb78 4bfb7065 60000000 <e9230010> 2fa90000 
419e00dc e9290010 
  [  277.112629] ---[ end trace a6aa80c26ba676f6 ]---
  [  277.116859] 
  [  277.116910] Sending IPI to other CPUs
  [  277.118085] IPI complete
  [  277.120271] kexec: waiting for cpu 0 (physical 32) to enter OPAL
   -> smp_release_cpus()
  spinning_secondaries = 191
   <- smp_release_cpus()
   <- setup_system()
  [    0.397633] Kernel panic - not syncing: Out of memory and no killable 
processes...
  [    0.397633] 
  [    0.397769] CPU: 4 PID: 1 Comm: swapper/1 Not tainted 4.4.0-34-generic 
#53-Ubuntu
  [    0.397843] Call Trace:
  [    0.397870] [c00000000c583190] [c000000008af983c] dump_stack+0xb0/0xf0 
(unreliable)
  [    0.397959] [c00000000c5831d0] [c000000008af5a70] panic+0x100/0x2c0
  [    0.398035] [c00000000c583260] [c000000008231e04] out_of_memory+0x5e4/0x5f0
  [    0.398114] [c00000000c583310] [c00000000823a434] 
__alloc_pages_nodemask+0xc54/0xc90
  [    0.398204] [c00000000c583500] [c0000000082a0a6c] 
alloc_page_interleave+0x6c/0xe0
  [    0.398292] [c00000000c583550] [c0000000082a1558] 
alloc_pages_current+0x138/0x1a0
  [    0.398381] [c00000000c5835a0] [c00000000822cdcc] 
__page_cache_alloc+0x11c/0x160
  [    0.398470] [c00000000c5835e0] [c00000000822cf84] 
pagecache_get_page+0x174/0x2a0
  [    0.398558] [c00000000c583650] [c00000000822d4b4] 
grab_cache_page_write_begin+0x54/0x80
  [    0.398646] [c00000000c583690] [c00000000831d484] 
simple_write_begin+0x54/0x180
  [    0.398735] [c00000000c5836e0] [c00000000822ca64] 
generic_perform_write+0x104/0x280
  [    0.398823] [c00000000c583780] [c00000000822ed08] 
__generic_file_write_iter+0x208/0x250
  [    0.398912] [c00000000c5837e0] [c00000000822ee40] 
generic_file_write_iter+0xf0/0x280
  [    0.399000] [c00000000c583830] [c0000000082e1844] new_sync_write+0xc4/0x120
  [    0.399076] [c00000000c5838d0] [c0000000082e2640] vfs_write+0xc0/0x230
  [    0.399152] [c00000000c583920] [c0000000082e367c] SyS_write+0x6c/0x110
  [    0.399229] [c00000000c583970] [c000000008ea700c] xwrite+0x4c/0xb4
  [    0.399305] [c00000000c5839b0] [c000000008ea7164] do_copy+0xf0/0x170
  [    0.399381] [c00000000c5839e0] [c000000008ea6774] write_buffer+0x5c/0x88
  [    0.399458] [c00000000c583a10] [c000000008ea67fc] flush_buffer+0x5c/0xf0
  [    0.399534] [c00000000c583a60] [c000000008eea034] __gunzip+0x378/0x470
  [    0.399610] [c00000000c583ae0] [c000000008ea75ac] 
unpack_to_rootfs+0x1f8/0x34c
  [    0.399699] [c00000000c583ba0] [c000000008ea7910] 
populate_rootfs+0x94/0x164
  [    0.399775] [c00000000c583c20] [c00000000800b49c] 
do_one_initcall+0x12c/0x2a0
  [    0.399852] [c00000000c583cf0] [c000000008ea4204] 
kernel_init_freeable+0x28c/0x37c
  [    0.399940] [c00000000c583dc0] [c00000000800be0c] kernel_init+0x2c/0x160
  [    0.400016] [c00000000c583e30] [c000000008009538] 
ret_from_kernel_thread+0x5c/0xa4
  [    0.418756] ---[ end Kernel panic - not syncing: Out of memory and no 
killable processes...
  [    0.418756] 


  oot@lep8b:~# uname -a
  Linux lep8b 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:04:07 UTC 2016 
ppc64le ppc64le ppc64le GNU/Linux
  root@lep8b:~# cat /etc/os-release 
  NAME="Ubuntu"
  VERSION="16.10 (Yakkety Yak)"
  ID=ubuntu
  ID_LIKE=debian
  PRETTY_NAME="Ubuntu 16.10"
  VERSION_ID="16.10"
  HOME_URL="http://www.ubuntu.com/";
  SUPPORT_URL="http://help.ubuntu.com/";
  BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/";
  UBUNTU_CODENAME=yakkety
  root@lep8b:~# update_flash -d
  Current firwmare version :
    P side    : FW860.00 (SV860_026)
    T side    : FW860.00 (SV860_028)
    Boot side : FW860.00 (SV860_028)
  root@lep8b:~# cat /sys/firmware/opal/msglog | grep -i skiboot
  [45182541432,5] SkiBoot skiboot-5.3.0-rc2 starting...
  root@lep8b:~# 
  root@lep8b:~# lspci
  0000:00:00.0 PCI bridge: IBM Device 03dc
  0000:01:00.0 RAID bus controller: IBM Obsidian-E PCI-E SCSI controller (rev 
01)
  0001:00:00.0 PCI bridge: IBM Device 03dc
  0001:01:00.0 PCI bridge: PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI 
Express Gen 3 (8.0 GT/s) Switch (rev ca)
  0001:02:01.0 PCI bridge: PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI 
Express Gen 3 (8.0 GT/s) Switch (rev ca)
  0001:02:08.0 PCI bridge: PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI 
Express Gen 3 (8.0 GT/s) Switch (rev ca)
  0001:02:09.0 PCI bridge: PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI 
Express Gen 3 (8.0 GT/s) Switch (rev ca)
  0001:03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
  0001:03:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
  0001:03:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
  0001:03:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
  0001:04:00.0 RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) (rev 01)
  0002:00:00.0 PCI bridge: IBM Device 03dc
  0002:01:00.0 Fibre Channel: Emulex Corporation Lancer-X: LightPulse Fibre 
Channel Host Adapter (rev 10)
  0002:01:00.1 Fibre Channel: Emulex Corporation Lancer-X: LightPulse Fibre 
Channel Host Adapter (rev 10)
  0003:00:00.0 PCI bridge: IBM Device 03dc
  0003:01:00.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
  0003:02:01.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
  0003:02:08.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
  0003:02:09.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
  0003:02:10.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
  0003:02:11.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
  0003:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 
xHCI Host Controller (rev 02)
  0003:04:00.0 RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) (rev 01)
  0003:05:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
  0003:05:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
  0003:05:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
  0003:05:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
  0003:0b:00.0 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre 
Channel Host Adapter (rev 03)
  0003:0b:00.1 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre 
Channel Host Adapter (rev 03)
  0004:00:00.0 PCI bridge: IBM Device 03dc
  0005:00:00.0 PCI bridge: IBM Device 03dc
  0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) 
(rev 10)
  0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) 
(rev 10)
  0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) 
(rev 10)
  0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) 
(rev 10)
  0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator 
(Lancer) (rev 10)
  0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator 
(Lancer) (rev 10)
  0006:00:00.0 PCI bridge: IBM Device 03dc
  0006:01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
  0006:01:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
  0006:01:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
  0006:01:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)

  == Comment: #1 - Milton D. Miller II <milt...@us.ibm.com> - 2016-09-09 
19:05:32 ==
  From the opcode the dereferencing 0x10 from a NULL pointer 
  and the DAR was 0x10 so the pointer was NULL.

  disassembly of the printed opcodes shows an out of module call was 
  made and the result used as a base, the loaded value compared for 
  NULL, then the loaded value again loaded as a base with the same 
  16 byte offset.

  Looking at upstream, eeh_pe_bus_get can return NULL,
  and in pnv_eeh_reset both the returned bus and the bus->parent are
  checked for pci_is_root_bus which checks the word at offset 16 for NULL.
  The parent field is immediately after a list head and lines up.

  Without looking at the full function disassembly, it would appear that 
  pnv_eeh_reset needs to consider the action if the bus returned from 
  pnv_eeh_reset is NULL before checking if the bus or it parent is a root bus.

  == Comment: #2 - Russell Currey <rus...@au1.ibm.com> - 2016-09-11 21:46:21 ==
  Thanks for the details Milton, you're right.  I'll write a patch to fix this 
in EEH and make sure all eeh_pe_bus_get calls check for failure.

  == Comment: #3 - Russell Currey <rus...@au1.ibm.com> - 2016-09-12
  00:19:27 ==

  
  == Comment: #4 - Russell Currey <rus...@au1.ibm.com> - 2016-09-12 00:20:25 ==
  Attached a patch that should stop the oops, can you test?

  Note that not being able to find a bus is still an issue that we need
  to find the cause of.

  == Comment: #5 - Milton D. Miller II <milt...@us.ibm.com> - 2016-09-12 
12:36:18 ==
  Originator: There is a second problem that the kdump process failed because 
it ran out of memory.

  Please open a second defect to investigate that (unless you are aware
  of instructions setting up kdump that  were not followed).

  You should be able to recreate that via echo c > /proc/sysrq-trigger
  and look for the message:

  [    0.397633] Kernel panic - not syncing: Out of memory and no
  killable processes...

  [note: it appears to have failed unpacking the initrd early in the
  dump process on your machine.  This may be related to the partition
  definition such as memory size and distribution policy]

  == Comment: #6 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-11 
07:15:03 ==
  @mamatha
  Please create a ubuntu mirror request for this, the patches are merged in 
upstream.
  https://patchwork.ozlabs.org/patch/668552/

  
  Please backport the patches to respective 16.04.2/ 16.10 kernels.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1683699/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to