On 26/8/21 8:10 am, Hendrik Boom wrote: > For the past few months my home server (running an ascii installation > physically moved from another computer) has been suddenly stopping all > processing about once a month. apparently at random. It seems to stop > instantly, leaving power on and becoming completely responsive to ping, > existing ssh connexions and use of the physical keyboard. > > The system log, after a reboot, shows nothing unusual except of course > that there are no log entries for a shut-down. > > Can anyone provide ideas about tracking this down? > > It could of course be a random rare intermittent hardware error.
Sounds like the perfect application for netconsole. I have a raspberry pi that runs some stuff, on that I installed udplogger : https://lwn.net/Articles/571589/ Run with : /usr/local/bin/udplogger port=6666 dir=/root/udplogs/ I have a number of machines set up with netconsole on the command line, or loaded after boot. There are easier ways to do this, but for whatever reason this is what I use (I honestly don't recall) : DEST=192.168.24.218 mount none -t configfs /sys/kernel/config mkdir /sys/kernel/config/netconsole/target1 pushd /sys/kernel/config/netconsole/target1 echo 192.168.24.1 > local_ip echo $DEST > remote_ip echo br0 > dev_name arping -c1 $DEST | grep -o ..:..:..:..:..:.. > remote_mac echo 1 > enabled popd Or on the kernel command line : [email protected]/eth0,[email protected]/ab:cd:ef:12:34:56 That way I pretty much always get the oops that never makes it to disk. 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113147] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: radeon_dp_needs_link_train+0x69/0x70 [radeon] 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113163] CPU: 4 PID: 4109 Comm: kworker/4:1 Not tainted 5.12.10+ #11 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113170] Hardware name: Apple Inc. iMac12,2/Mac-XXXXXXXXXXXXXX, BIOS 87.0.0.0.0 06/14/2019 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113174] Workqueue: events radeon_dp_work_func [radeon] 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113229] Call Trace: 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113232] dump_stack+0x64/0x7c 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113237] panic+0xf6/0x280 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113241] ? radeon_dp_needs_link_train+0x69/0x70 [radeon] 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113267] __stack_chk_fail+0x10/0x10 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113271] radeon_dp_needs_link_train+0x69/0x70 [radeon] 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113297] radeon_connector_hotplug+0xa8/0xe0 [radeon] 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113315] radeon_dp_work_func+0x28/0x40 [radeon] 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113335] process_one_work+0x1c4/0x310 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113339] worker_thread+0x240/0x3c0 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113341] ? wq_update_unbound_numa+0x10/0x10 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113344] kthread+0x10a/0x120 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113346] ? kthread_park+0x80/0x80 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113348] ret_from_fork+0x1f/0x30 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113391] Kernel Offset: disabled 2021-07-09 11:19:14 192.168.24.187:6666 [1076324.113393] Rebooting in 10 seconds.. 2021-07-09 11:19:24 192.168.24.187:6666 [1076334.114131] ACPI MEMORY or I/O RESET_REG. Regards, Brad _______________________________________________ Dng mailing list [email protected] https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
