I reply to myself.

I did run the old 5.9 kernel from debian - which has proven quite stable.
I did run the same tests... and I found once error in the console indeed.


[  380.918996] Unable to handle kernel NULL pointer dereference
[  380.919198] tsk->{mm,active_mm}->context = 000000000000057d
[  380.919326] tsk->{mm,active_mm}->pgd = ffff8003f1fd4000
[  380.919496]               \|/ ____ \|/
                             "@'/ .. \`@"
                             /_| \__/ |_\
                                \__U_/
[  380.919510] stress-ng(1529): Oops [#287]
[  380.919536] CPU: 24 PID: 1529 Comm: stress-ng Tainted: G      D     E
 X  5.9.0-5-sparc64-smp #1 Debian 5.9.15-1
[  380.919557] TSTATE: 0000008811001602 TPC: 000000000042d8e0 TNPC:
000000000042d8e4 Y: 00000000    Tainted: G      D     E  X
[  380.919587] TPC: <do_signal+0x440/0x560>
[  380.919604] g0: ffff800100ef7194 g1: 0000000000000328 g2:
0000000000000000 g3: ffff80010002c000
[  380.919620] g4: ffff8003cf6f6b40 g5: ffff8003fdea4000 g6:
ffff8003cf9cc000 g7: 0000000000004000
[  380.919634] o0: 00000000000001e8 o1: 0000000000000328 o2:
ffff8003cf9cc000 o3: 0000000000000007
[  380.919650] o4: 0000000000000007 o5: fffffffffffffff2 sp:
ffff8003cf9cf451 ret_pc: 000000000042d8c4
[  380.919673] RPC: <do_signal+0x424/0x560>
[  380.919690] l0: 0208000104000004 l1: 00000044f0000226 l2:
ffff800100ef7194 l3: 0000000000000000
[  380.919705] l4: 0000000000000000 l5: 0000000000000005 l6:
ffff8003cf9cc000 l7: 0000000000698c20
[  380.919719] i0: 0000000000000070 i1: 0000000000000208 i2:
fffffffffffffff2 i3: ffff8003cf9eff70
[  380.919732] i4: fffffffffffffff2 i5: 0000000000000000 i6:
ffff8003cf9cf4d1 i7: 000000000042d6fc
[  380.919752] I7: <do_signal+0x25c/0x560>
[  380.919760] Call Trace:
[  380.919783] [<000000000042d6fc>] do_signal+0x25c/0x560
[  380.919806] [<000000000042e218>] do_notify_resume+0x58/0xa0
[  380.919828] [<0000000000404b48>] __handle_signal+0xc/0x30
[  380.919852] Caller[000000000042d6fc]: do_signal+0x25c/0x560
[  380.919874] Caller[000000000042e218]: do_notify_resume+0x58/0xa0
[  380.919893] Caller[0000000000404b48]: __handle_signal+0xc/0x30
[  380.919910] Caller[ffff800100ef716c]: 0xffff800100ef716c
[  380.919916] Instruction DUMP:
[  380.919923]  c029a00d
[  380.919930]  b4168008
[  380.919938]  900761e8
[  380.919945] <d25e2070>
[  380.919952]  40014fef
[  380.919959]  b416801c
[  380.919965]  c2592468
[  380.919972]  b8100008
[  380.919979]  920126c8

[  380.972358] systemd-journald[66048]: File
/var/log/journal/bdb2a41ce825489ba567bea53add247e/system.journal
corrupted or uncleanly shut down, renaming and replacing.
[  407.494981] systemd[1]: Started Journal Service.


as well as error in the stressors:
stress-ng: info:  [12989] stress-ng-fanotify: 148 open, 41 close write,
128 close nowrite, 96 access, 27 modify
stress-ng: info:  [12908] stress-ng-fanotify: 159 open, 66 close write,
108 close nowrite, 88 access, 43 modify
stress-ng: info:  [12911] stress-ng-fanotify: 147 open, 43 close write,
122 close nowrite, 99 access, 20 modify
stress-ng: info:  [13079] stress-ng-fanotify: 159 open, 60 close write,
112 close nowrite, 97 access, 32 modify
stress-ng: info:  [12820] stress-ng-fanotify: 155 open, 46 close write,
123 close nowrite, 87 access, 27 modify
stress-ng: info:  [913] unsuccessful run completed in 282.58s (4 mins,
42.58 secs)
stress-ng: fail:  [913] chattr instance 2 corrupted bogo-ops counter, 48
vs 0
stress-ng: fail:  [913] chattr instance 2 hash error in bogo-ops counter
and run flag, 1918819509 vs 0
stress-ng: fail:  [913] chattr instance 6 corrupted bogo-ops counter, 50
vs 0
stress-ng: fail:  [913] chattr instance 6 hash error in bogo-ops counter
and run flag, 506138270 vs 0
stress-ng: fail:  [913] dnotify instance 4 corrupted bogo-ops counter,
224 vs 0
info: 5 failures reached, aborting stress process
stress-ng: fail:  [913] dnotify instance 4 hash error in bogo-ops
counter and run flag, 1503783545 vs 0
stress-ng: fail:  [913] dnotify instance 6 corrupted bogo-ops counter,
222 vs 0
stress-ng: fail:  [913] dnotify instance 6 hash error in bogo-ops
counter and run flag, 4199465241 vs 0
stress-ng: fail:  [913] metrics-check: stressor metrics corrupted, data
is compromised


However the machine did not crash.
I did run exactly the same stress command again... and the failures are
reproducible, so I suppose maybe the tests are not 64bit big endian safe
or such.

Reply via email to