Hi, On Mon, Apr 24, 2023 at 9:58 PM Chen-Yu Tsai <we...@chromium.org> wrote: > > On Mon, Apr 24, 2023 at 11:42 PM Doug Anderson <diand...@chromium.org> wrote: > > > > Hi, > > > > On Mon, Apr 24, 2023 at 5:54 AM Daniel Thompson > > <daniel.thomp...@linaro.org> wrote: > > > > > > On Fri, Apr 21, 2023 at 03:53:30PM -0700, Douglas Anderson wrote: > > > > From: Colin Cross <ccr...@android.com> > > > > > > > > Implement a hardlockup detector that can be enabled on SMP systems > > > > that don't have an arch provided one or one implemented atop perf by > > > > using interrupts on other cpus. Each cpu will use its softlockup > > > > hrtimer to check that the next cpu is processing hrtimer interrupts by > > > > verifying that a counter is increasing. > > > > > > > > NOTE: unlike the other hard lockup detectors, the buddy one can't > > > > easily provide a backtrace on the CPU that locked up. It relies on > > > > some other mechanism in the system to get information about the locked > > > > up CPUs. This could be support for NMI backtraces like [1], it could > > > > be a mechanism for printing the PC of locked CPUs like [2], or it > > > > could be something else. > > > > > > > > This style of hardlockup detector originated in some downstream > > > > Android trees and has been rebased on / carried in ChromeOS trees for > > > > quite a long time for use on arm and arm64 boards. Historically on > > > > these boards we've leveraged mechanism [2] to get information about > > > > hung CPUs, but we could move to [1]. > > > > > > On the Arm platforms is this code able to leverage the existing > > > infrastructure to extract status from stuck CPUs: > > > https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html > > > > Yup! I wasn't explicit about this, but that's where you end up if you > > follow the whole bug tracker item that was linked as [2]. > > Specifically, we used to have downstream patches in the ChromeOS that > > just reached into the coresight range from a SoC specific driver and > > printed out the CPU_DBGPCSR. When Brian was uprevving rk3399 > > Chromebooks he found that the equivalent functionality had made it > > upstream in a generic way through the coresight framework. Brian > > confirmed it was working on rk3399 and made all of the device tree > > changes needed to get it all hooked up, so (at least for that SoC) it > > should work on that SoC. > > > > [2] https://issuetracker.google.com/172213129 > > IIRC with the coresight CPU debug driver enabled and the proper DT nodes > added, the panic handler does dump out information from the hardware. > I don't think it's wired up for hung tasks though.
Yes, that's correct. The coresight CPU debug driver doesn't work for hung tasks because it can't get a real stack crawl. All it can get is the PC of the last branch that the CPU took. This is why combining ${SUBJECT} patch with the ability to get stack traces via pseudo-NMI is superior. That being said, even with just the coresight CPU debug driver ${SUBJECT} patch is still helpful because (assuming "hardlockup_panic" is set) we'll do a panic which will then trigger the coresight CPU debug driver. :-) -Doug _______________________________________________ Kgdb-bugreport mailing list Kgdb-bugreport@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport