(Sorry for the messy quoting, I'm not actually on the list so I didn't
see this reply until I thought to check the ML archives)

> If the stack is corrupted the backtrace may or may not be affected.

Sure, but it happening every time is pretty surprising to me.

> Why not bisect the kernel to find the actual bug?

A) I'm going to try booting variously old versions of the kernel, but...
B) I don't actually know that there was a version where the problem
I'm encountering didn't exist, so it's a relatively open search, and
C) Actually compiling kernels on this hardware will take an age each
time, so I was hoping to get better insight into the bug through a
stacktrace.

- Rich

On Wed, May 12, 2021 at 10:49 PM Rich <rincebr...@gmail.com> wrote:
>
> Hi all,
> So, I got my earlier system running sparc64 using a terrible method
> (from inside the existing sparc install, mount -o remount,ro /; nc -l
> | dd of=/dev/sda [...] an image generated in a VM, reboot and pray),
> but now I'm doing the thing I actually wanted a sparc64 system for
> (testing a kernel module on sparc64), and encountering a problem.
>
> While running through its test suite, when it runs through a certain
> suite of tests, every time (so far) it dies in the same annoying
> fashion:
> [ 1435.191913] Kernel panic - not syncing: corrupted stack end
> detected inside scheduler
> [ 1435.294939] CPU: 0 PID: 722 Comm: spl_system_task Tainted: P
>    OE     5.10.0-6-sparc64 #1 Debian 5.10.28-1
> [ 1435.431126] Call Trace:
> [ 1435.463267] Press Stop-A (L1-A) from sun keyboard or send break
> [ 1435.463267] twice on console to return to the boot prom
> [ 1435.609777] ---[ end Kernel panic - not syncing: corrupted stack
> end detected inside scheduler ]---
>
> RED State Exception
>
> TL=0000.0000.0000.0005 TT=0000.0000.0000.0010
>    TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506
> TL=0000.0000.0000.0004 TT=0000.0000.0000.0010
>    TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506
> TL=0000.0000.0000.0003 TT=0000.0000.0000.0010
>    TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506
> TL=0000.0000.0000.0002 TT=0000.0000.0000.0010
>    TPC=0000.0000.0040.70d0 TnPC=0000.0000.0040.70d4 TSTATE=0000.0000.8004.1406
> TL=0000.0000.0000.0001 TT=0000.0000.0000.0068
>    TPC=0000.0000.0048.bba4 TnPC=0000.0000.0048.bba8 TSTATE=0000.0000.8000.1606
>
>
> Watchdog Reset
> Externally Initiated Reset
> ok
>
> (Sometimes, it winds up so disgruntled, the watchdog reset never
> triggers, break twice on the console doesn't work, you need to
> physically power cycle it.)
>
> I'm mostly curious about whether anyone knows why the Call Trace might
> be empty - I see the message about corrupted stack end above it, but
> from what I can see online, plenty of people get that message and a
> call trace printout below it (...on other architectures, at least).
> https://lists.debian.org/debian-sparc/2016/09/msg00002.html is even an
> example of someone on this very list.
>
> Does anyone have any insights? Or am I going to have to resort to
> printks in random parts of the thread the panic notes and hope I find
> the problem?
>
> Thanks!
> - Rich

Reply via email to