On Wed, Jun 18, 2025 at 04:54:34PM -0300, K R wrote: > >Synopsis: server freezes under heavy CPU usage > >Category: kernel > >Environment: > System : OpenBSD 7.7 > Details : OpenBSD 7.7-current (GENERIC.MP) #21: Tue Jun > 17 17:40:27 MDT 2025 > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 > >Description: > > This machine is a Dell PowerEdge R440 with 16 CPUs and 128GM of RAM. > It freezes under heavy CPU usage, specially with lots of threads. > This started with 7.7-release + syspatches but continues with a > -current as of today. > > No panic, nothing, just freezes. Can't even force into ddb (with > ddb.console=1). During last test, top(1) froze with this last output: > > load averages: 10.73, 11.12, 10.53 test > 16:46:44 > 125 processes: 93 idle, 32 on processor up 0 days > 00:59:55 > 16 CPUs: 17.7% user, 51.5% nice, 3.6% sys, 1.1% spin, 1.1% intr, 25.0% > idle > Memory: Real: 13G/37G act/tot Free: 87G Cache: 22G Swap: 0K/64G > > PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND > 60756 root 64 0 13G 13G onproc/1 - 35:05 100.78% > python3.12 > 27129 root 10 20 9124K 1532K onproc/2 fsleep 8:31 75.98% semaphore > 42272 root 10 20 9644K 1540K onproc/4 fsleep 8:36 75.83% semaphore > 60257 root 10 20 9644K 1560K onproc/14 fsleep 8:32 74.76% semaphore > 27054 root 64 0 384K 328K onproc/7 - 0:26 71.24% rm > 58070 root 10 0 15M 4428K sleep/13 fsleep 23:45 36.04% nfdump > 11522 root 10 20 9636K 1524K onproc/0 fsleep 8:44 31.93% semaphore > 40359 root 10 20 9648K 1556K onproc/2 fsleep 8:44 29.88% semaphore > 72237 root 10 20 9632K 1520K onproc/0 fsleep 8:41 27.20% semaphore > 42031 root 10 20 9648K 1576K onproc/8 fsleep 8:39 27.10% semaphore > 97960 root 10 20 9644K 1536K onproc/8 fsleep 8:39 26.46% semaphore > 76525 root 10 0 95M 57M sleep/12 fsleep 10:01 12.84% nfdump > 68093 root 10 20 96M 64M sleep/3 fsleep 10:07 12.11% nfdump > 94072 root -5 20 27M 11M sleep/3 biowait 4:42 1.03% pigz > 52734 root 2 0 1640K 2740K sleep/4 kqread 0:37 0.98% top > 84043 root 10 20 27M 11M sleep/3 inode 2:07 0.34% pigz > 95028 root 10 20 27M 11M sleep/4 inode 2:09 0.15% pigz > 66823 root 10 20 26M 11M sleep/0 inode 2:07 0.05% pigz > 59751 root 2 0 2768K 3244K sleep/0 kqread 0:09 0.05% tmux > 58124 root -22 0 0K 4K sleep/1 - 37:01 0.00% idle1 > 59513 root -22 0 0K 4K sleep/2 - 36:27 0.00% idle2 > > Any recommendations on what could help debugging?
Run a witness kernel. Remove comment '#' in #option WITNESS src/sys/arch/amd64/conf/GENERIC.MP and rebuild fresh kernel after make clean and make config. Set sysctl kern.witness.watch=2 to get stacktraces. It might report some false positives or known bugs. Maybe it finds something. Best we can expect is a panic instead of hang. Then show all locks in ddb and trace on all CPU would be useful. > >How-To-Repeat: > > Start lots of thread-intensive programs, like pigz(1), nfdump(1), etc. > I also had a simple C test program using SYSV IPC semaphores running. > The problem seems to require a reasonable number of CPUs (16 or more) > to manifest itself. > > >Fix: > > Unknown. > > Thanks, > --Kor