Hi Riccardo, all,

On 17.01.22 21:35, Riccardo Mottola wrote:
Hi,


Riccardo Mottola wrote:
John Paul Adrian Glaubitz wrote:
Not nice. I started compiling some stuff and the box froze, I connected
serial console and could not resume due to Fast Data Access MMU miss"
So, this crash occurs with the latest 5.15 kernel on your T2000?
exactly latest kernel.

I will retest it with stress-ng as soon as I finish this email and copy
the dmesg errors.



wow, running the test suite once or twice, I am able to have the system
power-cycle... wow

Frank test latest kernel on yours :)

I yesterday found the time to give Linux 5.15.0-3 a try on my T1000
(UltraSPARC T1) and V210 (US IIIi), but the boot issue is still there -
at least for my use case: The klibc based tools inside of the initramfs
are not able to mount the root FS over NFS (details further below).

But it's still good to see that mounting an on-disk root FS seems to
work now for your T2000, though the instabilities during runtime are not
reassuring.

For me the last good Debian kernel - at least for booting, more on that
shortly - is 5.9.0-5. Both T1000 and V210 boot fine with it (incl.
mounting the root FS via NFS(v3 BTW)). But during operation (tested with
`apt upgrade` on a root FS replicated multiple times for testing from
the same tarball) the V210 crashes (=> kernel panic), the T1000 does
not. For the V210 I also see that for 5.8.0-3. Doing the same with
kernel 4.19.0-5 running on the V210, no problems are seen, not even the
messages below.

The crash when running 5.9.0-5 or 5.8.0-3 is usually "announced" (or at
least accompanied) by one or more occurrence(s) of the following messages:
```
[...]
[  360.489852] CPU[0]: Cheetah+ D-cache parity error at
TPC[00000000005b28c8]
[  360.580300] TPC<bpf_check+0x1f68/0x34e0>
[...]
```
...which should be familiar for UltraSPARC IIIi users with newer kernels
(see for example [1] which shows it for 4.16.x). According to [2] this
error should be recoverable (otherwise it would be followed by a panic
and "Irrecoverable Cheetah+ parity error."), which seems to happen,
until it is no longer, but I don't see that second message, so something
else must happen.

[1]: https://www.spinics.net/lists/sparclinux/msg21019.html

[2]:
https://github.com/torvalds/linux/blob/master/arch/sparc/kernel/traps_64.c#L1767..L1799

Of course our CPU's caches don't go pop magically. There is something
broken in the newer kernels (> 4.19.x) for UltraSPARC IIIi (and most
likely all the other related processors, too), apart from the mounting
issues for NFS (see [3] for processors affected by this, update to that:
US II is not affected, too).

[3]: https://lists.debian.org/debian-sparc/2021/12/msg00004.html

If I find the time and mood I'll try to bisect this US IIIi specific
issue in the hope that we will eventually get a fix for it, also still
hoping for a fix for [4].

[4]: https://lists.debian.org/debian-sparc/2021/03/msg00045.html

Cheers,
Frank

****

## T1000 ##

```
[...]
[    0.000116] Linux version 5.15.0-3-sparc64-smp
(debian-ker...@lists.debian.org) (gcc-11 (Debian 11.2.0-14) 11.2.0, GNU
ld (GNU Binutils for Debian) 2.37.90.20220123) #1 SMP Debian 5.15.15-2
(2022-01-30)
[...]
[   12.484314] tg3 0001:03:04.0 enP1p3s4f0: Link is up at 1000 Mbps,
full duplex
[   12.484520] tg3 0001:03:04.0 enP1p3s4f0: Flow control is on for TX
and on for RX
[   12.484689] IPv6: ADDRCONF(NETDEV_CHANGE): enP1p3s4f0: link becomes ready
[   16.765173] Unable to handle kernel paging request at virtual address
0000612000000000
[   16.765384] tsk->{mm,active_mm}->context = 000000000000006e
[   16.765493] tsk->{mm,active_mm}->pgd = ffff800014af0000
[   16.765650]               \|/ ____ \|/
[   16.765650]               "@'/ .. \`@"
[   16.765650]               /_| \__/ |_\
[   16.765650]                  \__U_/
[   16.765975] nfsmount(374): Oops [#1]
[   16.766167] CPU: 2 PID: 374 Comm: nfsmount Tainted: G            E
  5.15.0-3-sparc64-smp #1  Debian 5.15.15-2
[   16.766345] TSTATE: 0000000011001607 TPC: 00000000006a5fe8 TNPC:
00000000006a5fec Y: 00000000    Tainted: G            E
[   16.766642] TPC: <kfree+0x48/0x2c0>
[   16.766704] g0: ffff80000f2e7451 g1: 0000000400000000 g2:
0000600000000000 g3: ffff8001fd786000
[   16.766802] g4: ffff800014245e80 g5: ffff8001fd786000 g6:
ffff80000f2e4000 g7: ffff80000f2e7c30
[   16.766983] o0: fffffffffffffffe o1: 00000000006fd714 o2:
0000000000002000 o3: ffff80000f2cbaf8
[   16.767209] o4: 0000000000000008 o5: 0000000000000cc0 sp:
ffff80000f2e7491 ret_pc: 00000000006fd6d4
[   16.767292] RPC: <sys_mount+0x74/0x1a0>
[   16.767456] l0: ffff800014398408 l1: ffff8001fedeaa00 l2:
0000000000422db4 l3: 0000000000201e00
[   16.767591] l4: 000000000000029c l5: ffff80010000c1a0 l6:
ffff80000f2e4000 l7: 00000000006fd660
[   16.767771] i0: 0000000000000cc0 i1: 0000000000201ff0 i2:
0000000000000001 i3: ffff80000f2e7dd0
[   16.767996] i4: 0000000000000000 i5: 0000612000000000 i6:
ffff80000f2e7561 i7: 00000000006fd714
[   16.768079] I7: <sys_mount+0xb4/0x1a0>
[   16.768189] Call Trace:
[   16.768326] [<00000000006fd714>] sys_mount+0xb4/0x1a0
[   16.768456] [<00000000006fd6d4>] sys_mount+0x74/0x1a0
[   16.768628] [<0000000000406274>] linux_sparc_syscall+0x34/0x44
[   16.768856] Disabling lock debugging due to kernel taint
[   16.768917] Caller[00000000006fd714]: sys_mount+0xb4/0x1a0
[   16.769093] Caller[00000000006fd6d4]: sys_mount+0x74/0x1a0
[   16.769316] Caller[0000000000406274]: linux_sparc_syscall+0x34/0x44
[   16.769444] Caller[0000000000100a94]: 0x100a94
[   16.769596] Instruction DUMP:
[   16.769603]  ba074001
[   16.769693]  bb2f7003
[   16.769735]  ba074002
[   16.769775] <c25f6008>
[   16.769865]  84086001
[   16.770037]  82007fff
[   16.770134]  8378841d
[   16.770226]  ba100001
[   16.770315]  c2586008
[   16.770456]
Killed
Begin: Retrying nfs mount ...
[...]
```

## V210 ##

```
[...]
[    0.000168] Linux version 5.15.0-3-sparc64-smp
(debian-ker...@lists.debian.org) (gcc-11 (Debian 11.2.0-14) 11.2.0, GNU
ld (GNU Binutils for Debian) 2.37.90.20220123) #1 SMP Debian 5.15.15-2
(2022-01-30)
[...]
[   40.241993] tg3 0000:00:02.0 enp0s2f0: Link is up at 1000 Mbps, full
duplex
[   40.333591] tg3 0000:00:02.0 enp0s2f0: Flow control is on for TX and
on for RX
[   40.428669] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s2f0: link becomes ready
[   44.294909] FS-Cache: Loaded
[   44.397657] RPC: Registered named UNIX socket transport module.
[   44.475650] RPC: Registered udp transport module.
[   44.537450] RPC: Registered tcp transport module.
[   44.599295] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   44.815002] FS-Cache: Netfs 'nfs' registered for caching
mount: Invalid argument
Begin: Retrying nfs mount ... mount: Invalid argument
done.
[...]
```

Reply via email to