You have been subscribed to a public bug:

== Comment: #0 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-17 
06:08:41 ==
---Problem Description---
HMI Recoverable error injection tests leads to system checkstop followed by 
system dump with ubuntu 17.04 os and kernel 4.10.0-19-generic ppc64le
 
Contact Information = ppaid...@in.ibm.com 
 
---uname output---
#21-Ubuntu SMP Thu Apr 6 17:03:05 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
 
Machine Type = PowerNV 8284-22A 
 
---System Hang---
 System is in dumping state. after dump finishes system will IPL to OS again.
 
---Debugger---
A debugger is not configured
 

== Comment: #3 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-17 
06:12:51 ==
# uname -a
#21-Ubuntu SMP Thu Apr 6 17:03:05 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
# cat /etc/os-release 
NAME="Ubuntu"
VERSION="17.04 (Zesty Zapus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 17.04"
VERSION_ID="17.04"
HOME_URL="https://www.ubuntu.com/";
SUPPORT_URL="https://help.ubuntu.com/";
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/";
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy";
VERSION_CODENAME=zesty
UBUNTU_CODENAME=zesty
root@p8wookie:~#

== Comment: #4 - Kevin W. Rudd <ru...@us.ibm.com> - 2017-04-17 11:10:22
==


== Comment: #5 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 
2017-04-17 13:34:03 ==
it looks like below commit is a culprit:

=======================================
commit 2337d207288f163e10bd8d4d7eeb0c1c75046a0c
Author: Nicholas Piggin <npig...@gmail.com>
Date:   Fri Jan 27 14:24:33 2017 +1000

    powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts
    
    The branch from hmi_exception_early to hmi_exception_realmode must use
    a "relocatable-style" branch, because it is branching from unrelocated
    exception code to beyond __end_interrupts.
    
    Signed-off-by: Nicholas Piggin <npig...@gmail.com>
    Signed-off-by: Michael Ellerman <m...@ellerman.id.au>
=======================================

With the above commit changes now hmi_exception_realmode() is called
using bctrl which ends up messing up TOC (r2) value and further access
using new r2 results into unpredictable behaviour.

----------------------------------------
c000000000025f50 <hmi_exception_realmode>:
c000000000025f50:       3a 01 4c 3c     addis   r2,r12,314
c000000000025f54:       b0 01 42 38     addi    r2,r2,432
c000000000025f58:       a6 02 08 7c     mflr    r0
-----------------------------------------

With above commit the hmi_exception_early() code jumps to
c000000000025f50 (hmi_exception_realmode+0x0)  which then sets up new
value for r2.

If we revert above commit the code jumps to c000000000025f58
(hmi_exception_realmode+0x8) and hmi handler works fine.

After reverting above patch I don't see this issue anymore. I have
rebuilt the ubuntu kernel after reverting above patch and you can find
the kernel rpm at:

Can you please retry your tests with above kernel and see if issue still
persists.

== Comment: #6 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 
2017-04-17 23:02:31 ==
Spoke to Michael Ellerman this morning. He helped me to identify the root cause 
and a fix patch beow:

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 857bf7c5b946..7cfeb8768587 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -982,7 +982,7 @@ TRAMP_REAL_BEGIN(hmi_exception_early)
        EXCEPTION_PROLOG_COMMON_2(PACA_EXGEN)
        EXCEPTION_PROLOG_COMMON_3(0xe60)
        addi    r3,r1,STACK_FRAME_OVERHEAD
-       BRANCH_LINK_TO_FAR(r4, hmi_exception_realmode)
+       BRANCH_LINK_TO_FAR(r12, hmi_exception_realmode)
        /* Windup the stack. */
        /* Move original HSRR0 and HSRR1 into the respective regs */
        ld      r9,_MSR(r1)

== Comment: #7 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> -
2017-04-18 01:52:03 ==


== Comment: #8 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-18 
01:53:57 ==
Hi Mahesh
Tested all the HMI Recoverable errors on the below patched kernel, attached the 
corresponding executing logs. All tests are working fine.

#21 SMP Mon Apr 17 12:58:30 EDT 2017 ppc64le ppc64le ppc64le GNU/Linux


Thanks

== Comment: #9 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 
2017-04-18 06:07:56 ==
(In reply to comment #8)
> Hi Mahesh
> Tested all the HMI Recoverable errors on the below patched kernel, attached
> the corresponding executing logs. All tests are working fine.
> 
> Linux p8wookie 4.10.0-19.bz153487-generic #21 SMP Mon Apr 17 12:58:30 EDT
> 2017 ppc64le ppc64le ppc64le GNU/Linux
> 
> 
> Thanks

Thanks. Michael has posted fix for this upstream.

http://patchwork.ozlabs.org/patch/751647/

I will rebuild the new ubuntu kernel with above patch.

== Comment: #12 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-18 
09:27:59 ==
(In reply to comment #11)
> > 
> > https://git.kernel.org/powerpc/c/be5c5e843c4afa1c8397cb740b6032
> 
> I have built new kernel with above patch and you can find it below path
> 
>:/home2/mahesh/u2/bz153487v2/linux-image-4.10.0-19.bz153487v2-
> generic_4.10.0-19.bz153487v2.21_ppc64el.deb


Tested with this new patched kernel, all tests are working fine.

Linux p8wookie 4.10.0-19.bz153487v2-generic #21 SMP Tue Apr 18 07:43:13
EDT 2017 ppc64le ppc64le ppc64le GNU/Linux

Will attach is full the execution logs here.

== Comment: #13 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> -
2017-04-18 09:29:43 ==


== Comment: #14 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 
2017-04-19 03:52:18 ==
(In reply to comment #12)
> (In reply to comment #11)
> > > 
> > > https://git.kernel.org/powerpc/c/be5c5e843c4afa1c8397cb740b6032
> > 

Thanks for testing. We need to mirror this to ubuntu for fix patch
inclusion

> 
> Linux p8wookie 4.10.0-19.bz153487v2-generic #21 SMP Tue Apr 18 07:43:13 EDT
> 2017 ppc64le ppc64le ppc64le GNU/Linux
> 
> Will attach is full the execution logs here.

** Affects: ubuntu-power-systems
     Importance: High
     Assignee: Canonical Kernel Team (canonical-kernel-team)
         Status: In Progress

** Affects: linux (Ubuntu)
     Importance: High
     Assignee: Manoj Iyer (manjo)
         Status: In Progress


** Tags: architecture-ppc64le bugnameltc-153487 severity-critical 
targetmilestone-inin1704 ubuntu-17.04
-- 
[LTCTest][Opal][FW860.20] HMI recoverable errors failed to recover and system 
goes to dump state.
https://bugs.launchpad.net/bugs/1684054
You received this bug notification because you are a member of Kernel Packages, 
which is subscribed to linux in Ubuntu.

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to