I reply to myself. I did run the old 5.9 kernel from debian - which has proven quite stable. I did run the same tests... and I found once error in the console indeed.
[ 380.918996] Unable to handle kernel NULL pointer dereference [ 380.919198] tsk->{mm,active_mm}->context = 000000000000057d [ 380.919326] tsk->{mm,active_mm}->pgd = ffff8003f1fd4000 [ 380.919496] \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ [ 380.919510] stress-ng(1529): Oops [#287] [ 380.919536] CPU: 24 PID: 1529 Comm: stress-ng Tainted: G D E X 5.9.0-5-sparc64-smp #1 Debian 5.9.15-1 [ 380.919557] TSTATE: 0000008811001602 TPC: 000000000042d8e0 TNPC: 000000000042d8e4 Y: 00000000 Tainted: G D E X [ 380.919587] TPC: <do_signal+0x440/0x560> [ 380.919604] g0: ffff800100ef7194 g1: 0000000000000328 g2: 0000000000000000 g3: ffff80010002c000 [ 380.919620] g4: ffff8003cf6f6b40 g5: ffff8003fdea4000 g6: ffff8003cf9cc000 g7: 0000000000004000 [ 380.919634] o0: 00000000000001e8 o1: 0000000000000328 o2: ffff8003cf9cc000 o3: 0000000000000007 [ 380.919650] o4: 0000000000000007 o5: fffffffffffffff2 sp: ffff8003cf9cf451 ret_pc: 000000000042d8c4 [ 380.919673] RPC: <do_signal+0x424/0x560> [ 380.919690] l0: 0208000104000004 l1: 00000044f0000226 l2: ffff800100ef7194 l3: 0000000000000000 [ 380.919705] l4: 0000000000000000 l5: 0000000000000005 l6: ffff8003cf9cc000 l7: 0000000000698c20 [ 380.919719] i0: 0000000000000070 i1: 0000000000000208 i2: fffffffffffffff2 i3: ffff8003cf9eff70 [ 380.919732] i4: fffffffffffffff2 i5: 0000000000000000 i6: ffff8003cf9cf4d1 i7: 000000000042d6fc [ 380.919752] I7: <do_signal+0x25c/0x560> [ 380.919760] Call Trace: [ 380.919783] [<000000000042d6fc>] do_signal+0x25c/0x560 [ 380.919806] [<000000000042e218>] do_notify_resume+0x58/0xa0 [ 380.919828] [<0000000000404b48>] __handle_signal+0xc/0x30 [ 380.919852] Caller[000000000042d6fc]: do_signal+0x25c/0x560 [ 380.919874] Caller[000000000042e218]: do_notify_resume+0x58/0xa0 [ 380.919893] Caller[0000000000404b48]: __handle_signal+0xc/0x30 [ 380.919910] Caller[ffff800100ef716c]: 0xffff800100ef716c [ 380.919916] Instruction DUMP: [ 380.919923] c029a00d [ 380.919930] b4168008 [ 380.919938] 900761e8 [ 380.919945] <d25e2070> [ 380.919952] 40014fef [ 380.919959] b416801c [ 380.919965] c2592468 [ 380.919972] b8100008 [ 380.919979] 920126c8 [ 380.972358] systemd-journald[66048]: File /var/log/journal/bdb2a41ce825489ba567bea53add247e/system.journal corrupted or uncleanly shut down, renaming and replacing. [ 407.494981] systemd[1]: Started Journal Service. as well as error in the stressors: stress-ng: info: [12989] stress-ng-fanotify: 148 open, 41 close write, 128 close nowrite, 96 access, 27 modify stress-ng: info: [12908] stress-ng-fanotify: 159 open, 66 close write, 108 close nowrite, 88 access, 43 modify stress-ng: info: [12911] stress-ng-fanotify: 147 open, 43 close write, 122 close nowrite, 99 access, 20 modify stress-ng: info: [13079] stress-ng-fanotify: 159 open, 60 close write, 112 close nowrite, 97 access, 32 modify stress-ng: info: [12820] stress-ng-fanotify: 155 open, 46 close write, 123 close nowrite, 87 access, 27 modify stress-ng: info: [913] unsuccessful run completed in 282.58s (4 mins, 42.58 secs) stress-ng: fail: [913] chattr instance 2 corrupted bogo-ops counter, 48 vs 0 stress-ng: fail: [913] chattr instance 2 hash error in bogo-ops counter and run flag, 1918819509 vs 0 stress-ng: fail: [913] chattr instance 6 corrupted bogo-ops counter, 50 vs 0 stress-ng: fail: [913] chattr instance 6 hash error in bogo-ops counter and run flag, 506138270 vs 0 stress-ng: fail: [913] dnotify instance 4 corrupted bogo-ops counter, 224 vs 0 info: 5 failures reached, aborting stress process stress-ng: fail: [913] dnotify instance 4 hash error in bogo-ops counter and run flag, 1503783545 vs 0 stress-ng: fail: [913] dnotify instance 6 corrupted bogo-ops counter, 222 vs 0 stress-ng: fail: [913] dnotify instance 6 hash error in bogo-ops counter and run flag, 4199465241 vs 0 stress-ng: fail: [913] metrics-check: stressor metrics corrupted, data is compromised However the machine did not crash. I did run exactly the same stress command again... and the failures are reproducible, so I suppose maybe the tests are not 64bit big endian safe or such.