On 26/8/21 8:10 am, Hendrik Boom wrote:
> For the past few months my home server (running an ascii installation 
> physically moved from another computer) has been suddenly stopping all 
> processing about once a month. apparently at random.  It seems to stop 
> instantly, leaving power on and becoming completely responsive to ping,
> existing ssh connexions and use of the physical keyboard.
> 
> The system log, after a reboot, shows nothing unusual except of course 
> that there are no log entries for a shut-down.
> 
> Can anyone provide ideas about tracking this down?
> 
> It could of course be a random rare intermittent hardware error.

Sounds like the perfect application for netconsole.

I have a raspberry pi that runs some stuff, on that I installed udplogger : 
https://lwn.net/Articles/571589/
Run with : /usr/local/bin/udplogger port=6666 dir=/root/udplogs/

I have a number of machines set up with netconsole on the command line, or 
loaded after boot. There are easier ways to do this, but for whatever reason 
this is what I use (I honestly don't recall) :

        DEST=192.168.24.218
        mount none -t configfs /sys/kernel/config
        mkdir /sys/kernel/config/netconsole/target1
        pushd /sys/kernel/config/netconsole/target1
        echo 192.168.24.1 > local_ip
        echo $DEST > remote_ip
        echo br0 > dev_name
        arping -c1 $DEST | grep -o ..:..:..:..:..:.. > remote_mac
        echo 1 > enabled
        popd

Or on the kernel command line  :
[email protected]/eth0,[email protected]/ab:cd:ef:12:34:56

That way I pretty much always get the oops that never makes it to disk.

2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113147] Kernel panic - not 
syncing: stack-protector: Kernel stack is corrupted in: 
radeon_dp_needs_link_train+0x69/0x70 [radeon]
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113163] CPU: 4 PID: 4109 Comm: 
kworker/4:1 Not tainted 5.12.10+ #11
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113170] Hardware name: Apple 
Inc. iMac12,2/Mac-XXXXXXXXXXXXXX, BIOS 87.0.0.0.0 06/14/2019
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113174] Workqueue: events 
radeon_dp_work_func [radeon]
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113229] Call Trace:
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113232]  dump_stack+0x64/0x7c
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113237]  panic+0xf6/0x280
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113241]  ? 
radeon_dp_needs_link_train+0x69/0x70 [radeon]
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113267]  
__stack_chk_fail+0x10/0x10
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113271]  
radeon_dp_needs_link_train+0x69/0x70 [radeon]
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113297]  
radeon_connector_hotplug+0xa8/0xe0 [radeon]
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113315]  
radeon_dp_work_func+0x28/0x40 [radeon]
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113335]  
process_one_work+0x1c4/0x310
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113339]  
worker_thread+0x240/0x3c0
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113341]  ? 
wq_update_unbound_numa+0x10/0x10
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113344]  kthread+0x10a/0x120
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113346]  ? 
kthread_park+0x80/0x80
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113348]  
ret_from_fork+0x1f/0x30
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113391] Kernel Offset: disabled
2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113393] Rebooting in 10 
seconds..
2021-07-09 11:19:24 192.168.24.187:6666 [1076334.114131] ACPI MEMORY or I/O 
RESET_REG.

Regards,
Brad
_______________________________________________
Dng mailing list
[email protected]
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng

Reply via email to