On 2019-01-14, Marco Prause <[email protected]> wrote: > after an initial boot, everything is working fine for round about 4 hours. > > After 4 hours, it is not possible to login into the backup/secondary > openbsd-server via ssh or even via serial console, but it seems to still > forward traffic correctly. Also the ospf adjacencies are up&running as > well as ipsec security associations and so on. > > Monitoring metrics doesn't show any meassured increase of any data. > > I've already exchanged the hardware, because it was my first guess, as > the first server/gateway is running without any problems with the same > 6.4-stable and config version - but this unfortunately didn't help.
Is it the same or different hardware type and BIOS version for the working and hanging machines? (maybe diff the two dmesgs) Same or different filesystem mount options? (Are you using softdep?) > When I left an serial console login opened, I was able to execute some > commands and also a top, I've invoked before, was still running at the > failure-state. But when entering e.g. ifconfig, or trying a > tab-completion also the serial console freezes. The "WAIT" column of a running top(1) may include useful information. If possible, run with "sysctl ddb.console=1" (needs setting pre-securelevel, add it to sysctl.conf if it's not already there), which should allow you to enter ddb by sending a BREAK signal over the serial line (~# in cu(1)). You can try that under normal operation (will interrupt service; be ready to type "c" and enter to continue to resume) to check it works. Then during a hang attempt to enter ddb, if you are successful then capture at least the following: ps trace Ideally also switch to all other cpus (the number in the ddb prompt shows the current one; you can do "mach ddbcpu 3" etc to switch to another) and re-run trace (which is completely per-cpu), ps (the line marked "*" indicates the currently active process on the currently selected CPU - for a report there's no need to repeat the entire list N times but could be useful to indicate the running processes on all CPUs). When you are done with these then also fetch: sh malloc sh all pools For the benefit of other readers who don't have serial console, ctrl+alt+esc on the keyboard will do the same if the keyboard/monitor are the selected console device, obviously it will be harder to capture the output in an easily readable format!

