** Changed in: linux (Ubuntu)
Assignee: Joseph Salisbury (jsalisbury) => Khaled El Mously (kmously)
** Changed in: linux (Ubuntu Bionic)
Assignee: Joseph Salisbury (jsalisbury) => Khaled El Mously (kmously)
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1789772
Title:
tlbie master timeout checkstop (using NVidia/GPU)
Status in The Ubuntu-power-systems project:
Triaged
Status in linux package in Ubuntu:
Triaged
Status in linux source package in Bionic:
Triaged
Bug description:
A hung state machine in the chip's NMU logic can trigger a fatal
condition that will be flagged by hardware through a checkstop. Hence,
customers that have a Power 9 Whitherspoon (equipped with GPUs) will
experience a crash on their server when using NVIDIA's toolkit.
The server will crash with the following hardware failing message:
Unrecoverable Hardware Failure, (Critical) A system checkstop occurred
(AffectedSubsystem: Canister/Appliance, PID: 19703), Resolved: 0
In this case, a `NCUFIR[10] tlbie master timeout` has been observed by
only starting the NVIDIA ATS driver. This issue is being triggered
because the NMU logic is getting stuck when a page is upgraded from RO
-> RW without a following tlbie.
This is addressed with the following patches:
bd5050e38aec3055ff4257ade987d808ac93b582 powerpc/mm/radix: Change pte relax
sequence to handle nest MMU hang
e4c1112c3fc503fc78379fa61450bfda3f0717fe powerpc/mm: Change function prototype
044003b52a78bcbda7103633c351da16505096cf powerpc/mm/radix: Move function from
radix.h to pgtable-radix.c
f069ff396d657ac7bdb5de866c3ec28b8d08d953 powerpc/mm/hugetlb: Update
huge_ptep_set_access_flags to call __ptep_set_access_flags directly
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1789772/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp