I'm trying to track down a problem that's been getting worse as my Dell 2850 main home server has been getting more loaded down with work. From time to time, the machine will lock up temporarily: it doesn't respond to ICMP ECHO, and it doesn't echo characters typed on the console. It will sit like that for three or four minutes, and then continue running. Nothing is logged, other than messages that are consequences of the hang.
Over the last couple of days, I've been at the console twice when it's happened, and have hit the machine's interrupt button to get it to drop into ddb, so I could get backtraces. In both cases, I interrupted it in x86_pause(), where it was waiting on a spinlock during a call to uvm_pagealloc_strat(). I thought the 'cpu' command to ddb should switch between processors, so I could get backtraces from each, but ddb didn't recognize that command. Here are the two backtraces, hand copied: x86_pause uvm_pagealloc_strat uvmfault_promote uvm_fault_internal trap x86_pause mutex_spin_retry uvm_pagealloc_strat uvm_km_kmem_alloc ufs_readdir VOP_READDIR vn_readdir sys___getdents30 syscall The box has 8GB of RAM, and is a VLAN and VPN router, database server, NFS server, mail server, web server, and a number of other things. It tends to have a load well under 1, though, and most of its RAM used as file cache, so it's really not very heavily loaded. It's running NetBSD/amd64-current as per Oct 31. I'm looking at the output of things like vmstat, systat, and others, but I could really do with some ideas for where to look and what to look for. My assumption is that I'm after some reason why the system should suddenly be taking several minutes to free up some memory, when it's obviously got much more than it needs to begin with. :) -tih -- It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong. -Richard Feynman
