** Tags added: triage-g

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1684054

Title:
  [LTCTest][Opal][FW860.20] HMI recoverable errors failed to recover and
  system goes to dump state.

Status in The Ubuntu-power-systems project:
  In Progress
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Zesty:
  New

Bug description:
  == Comment: #0 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-17 
06:08:41 ==
  ---Problem Description---
  HMI Recoverable error injection tests leads to system checkstop followed by 
system dump with ubuntu 17.04 os and kernel 4.10.0-19-generic ppc64le
   
  Contact Information = ppaid...@in.ibm.com 
   
  ---uname output---
  #21-Ubuntu SMP Thu Apr 6 17:03:05 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = PowerNV 8284-22A 
   
  ---System Hang---
   System is in dumping state. after dump finishes system will IPL to OS again.
   
  ---Debugger---
  A debugger is not configured
   

  == Comment: #3 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-17 
06:12:51 ==
  # uname -a
  #21-Ubuntu SMP Thu Apr 6 17:03:05 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
  # cat /etc/os-release 
  NAME="Ubuntu"
  VERSION="17.04 (Zesty Zapus)"
  ID=ubuntu
  ID_LIKE=debian
  PRETTY_NAME="Ubuntu 17.04"
  VERSION_ID="17.04"
  HOME_URL="https://www.ubuntu.com/";
  SUPPORT_URL="https://help.ubuntu.com/";
  BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/";
  
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy";
  VERSION_CODENAME=zesty
  UBUNTU_CODENAME=zesty
  root@p8wookie:~#

  == Comment: #4 - Kevin W. Rudd <ru...@us.ibm.com> - 2017-04-17
  11:10:22 ==

  
  == Comment: #5 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 
2017-04-17 13:34:03 ==
  it looks like below commit is a culprit:

  =======================================
  commit 2337d207288f163e10bd8d4d7eeb0c1c75046a0c
  Author: Nicholas Piggin <npig...@gmail.com>
  Date:   Fri Jan 27 14:24:33 2017 +1000

      powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts
      
      The branch from hmi_exception_early to hmi_exception_realmode must use
      a "relocatable-style" branch, because it is branching from unrelocated
      exception code to beyond __end_interrupts.
      
      Signed-off-by: Nicholas Piggin <npig...@gmail.com>
      Signed-off-by: Michael Ellerman <m...@ellerman.id.au>
  =======================================

  With the above commit changes now hmi_exception_realmode() is called
  using bctrl which ends up messing up TOC (r2) value and further access
  using new r2 results into unpredictable behaviour.

  ----------------------------------------
  c000000000025f50 <hmi_exception_realmode>:
  c000000000025f50:       3a 01 4c 3c     addis   r2,r12,314
  c000000000025f54:       b0 01 42 38     addi    r2,r2,432
  c000000000025f58:       a6 02 08 7c     mflr    r0
  -----------------------------------------

  With above commit the hmi_exception_early() code jumps to
  c000000000025f50 (hmi_exception_realmode+0x0)  which then sets up new
  value for r2.

  If we revert above commit the code jumps to c000000000025f58
  (hmi_exception_realmode+0x8) and hmi handler works fine.

  After reverting above patch I don't see this issue anymore. I have
  rebuilt the ubuntu kernel after reverting above patch and you can find
  the kernel rpm at:

  Can you please retry your tests with above kernel and see if issue
  still persists.

  == Comment: #6 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 
2017-04-17 23:02:31 ==
  Spoke to Michael Ellerman this morning. He helped me to identify the root 
cause and a fix patch beow:

  diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
  index 857bf7c5b946..7cfeb8768587 100644
  --- a/arch/powerpc/kernel/exceptions-64s.S
  +++ b/arch/powerpc/kernel/exceptions-64s.S
  @@ -982,7 +982,7 @@ TRAMP_REAL_BEGIN(hmi_exception_early)
        EXCEPTION_PROLOG_COMMON_2(PACA_EXGEN)
        EXCEPTION_PROLOG_COMMON_3(0xe60)
        addi    r3,r1,STACK_FRAME_OVERHEAD
  -     BRANCH_LINK_TO_FAR(r4, hmi_exception_realmode)
  +     BRANCH_LINK_TO_FAR(r12, hmi_exception_realmode)
        /* Windup the stack. */
        /* Move original HSRR0 and HSRR1 into the respective regs */
        ld      r9,_MSR(r1)

  == Comment: #7 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> -
  2017-04-18 01:52:03 ==

  
  == Comment: #8 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-18 
01:53:57 ==
  Hi Mahesh
  Tested all the HMI Recoverable errors on the below patched kernel, attached 
the corresponding executing logs. All tests are working fine.

  #21 SMP Mon Apr 17 12:58:30 EDT 2017 ppc64le ppc64le ppc64le GNU/Linux

  
  Thanks

  == Comment: #9 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 
2017-04-18 06:07:56 ==
  (In reply to comment #8)
  > Hi Mahesh
  > Tested all the HMI Recoverable errors on the below patched kernel, attached
  > the corresponding executing logs. All tests are working fine.
  > 
  > Linux p8wookie 4.10.0-19.bz153487-generic #21 SMP Mon Apr 17 12:58:30 EDT
  > 2017 ppc64le ppc64le ppc64le GNU/Linux
  > 
  > 
  > Thanks

  Thanks. Michael has posted fix for this upstream.

  http://patchwork.ozlabs.org/patch/751647/

  I will rebuild the new ubuntu kernel with above patch.

  == Comment: #12 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-18 
09:27:59 ==
  (In reply to comment #11)
  > > 
  > > https://git.kernel.org/powerpc/c/be5c5e843c4afa1c8397cb740b6032
  > 
  > I have built new kernel with above patch and you can find it below path
  > 
  >:/home2/mahesh/u2/bz153487v2/linux-image-4.10.0-19.bz153487v2-
  > generic_4.10.0-19.bz153487v2.21_ppc64el.deb

  
  Tested with this new patched kernel, all tests are working fine.

  Linux p8wookie 4.10.0-19.bz153487v2-generic #21 SMP Tue Apr 18
  07:43:13 EDT 2017 ppc64le ppc64le ppc64le GNU/Linux

  Will attach is full the execution logs here.

  == Comment: #13 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> -
  2017-04-18 09:29:43 ==

  
  == Comment: #14 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 
2017-04-19 03:52:18 ==
  (In reply to comment #12)
  > (In reply to comment #11)
  > > > 
  > > > https://git.kernel.org/powerpc/c/be5c5e843c4afa1c8397cb740b6032
  > > 

  Thanks for testing. We need to mirror this to ubuntu for fix patch
  inclusion

  > 
  > Linux p8wookie 4.10.0-19.bz153487v2-generic #21 SMP Tue Apr 18 07:43:13 EDT
  > 2017 ppc64le ppc64le ppc64le GNU/Linux
  > 
  > Will attach is full the execution logs here.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1684054/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to