On Wed, Jun 18, 2025 at 04:54:34PM -0300, K R wrote:
> >Synopsis: server freezes under heavy CPU usage
> >Category:      kernel
> >Environment:
>         System      : OpenBSD 7.7
>          Details     : OpenBSD 7.7-current (GENERIC.MP) #21: Tue Jun
> 17 17:40:27 MDT 2025
> 
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>         Architecture: OpenBSD.amd64
>         Machine     : amd64
> >Description:
> 
> This machine is  a Dell PowerEdge R440 with 16 CPUs and 128GM of RAM.
> It freezes under heavy CPU usage, specially with lots of threads.
> This started with 7.7-release + syspatches but continues with a
> -current as of today.
> 
> No panic, nothing, just freezes.  Can't even force into ddb (with
> ddb.console=1).  During last test, top(1) froze with this last output:
> 
> load averages: 10.73, 11.12, 10.53                                 test 
> 16:46:44
> 125 processes: 93 idle, 32 on processor                       up 0 days 
> 00:59:55
> 16  CPUs: 17.7% user, 51.5% nice,  3.6% sys,  1.1% spin,  1.1% intr, 25.0% 
> idle
> Memory: Real: 13G/37G act/tot Free: 87G Cache: 22G Swap: 0K/64G
> 
>   PID USERNAME PRI NICE  SIZE   RES STATE     WAIT      TIME    CPU COMMAND
> 60756 root      64    0   13G   13G onproc/1  -        35:05 100.78% 
> python3.12
> 27129 root      10   20 9124K 1532K onproc/2  fsleep    8:31 75.98% semaphore
> 42272 root      10   20 9644K 1540K onproc/4  fsleep    8:36 75.83% semaphore
> 60257 root      10   20 9644K 1560K onproc/14 fsleep    8:32 74.76% semaphore
> 27054 root      64    0  384K  328K onproc/7  -         0:26 71.24% rm
> 58070 root      10    0   15M 4428K sleep/13  fsleep   23:45 36.04% nfdump
> 11522 root      10   20 9636K 1524K onproc/0  fsleep    8:44 31.93% semaphore
> 40359 root      10   20 9648K 1556K onproc/2  fsleep    8:44 29.88% semaphore
> 72237 root      10   20 9632K 1520K onproc/0  fsleep    8:41 27.20% semaphore
> 42031 root      10   20 9648K 1576K onproc/8  fsleep    8:39 27.10% semaphore
> 97960 root      10   20 9644K 1536K onproc/8  fsleep    8:39 26.46% semaphore
> 76525 root      10    0   95M   57M sleep/12  fsleep   10:01 12.84% nfdump
> 68093 root      10   20   96M   64M sleep/3   fsleep   10:07 12.11% nfdump
> 94072 root      -5   20   27M   11M sleep/3   biowait   4:42  1.03% pigz
> 52734 root       2    0 1640K 2740K sleep/4   kqread    0:37  0.98% top
> 84043 root      10   20   27M   11M sleep/3   inode     2:07  0.34% pigz
> 95028 root      10   20   27M   11M sleep/4   inode     2:09  0.15% pigz
> 66823 root      10   20   26M   11M sleep/0   inode     2:07  0.05% pigz
> 59751 root       2    0 2768K 3244K sleep/0   kqread    0:09  0.05% tmux
> 58124 root     -22    0    0K    4K sleep/1   -        37:01  0.00% idle1
> 59513 root     -22    0    0K    4K sleep/2   -        36:27  0.00% idle2
> 
> Any recommendations on what could help debugging?

Run a witness kernel.  Remove comment '#' in #option WITNESS
src/sys/arch/amd64/conf/GENERIC.MP and rebuild fresh kernel after
make clean and make config.  Set sysctl kern.witness.watch=2 to get
stacktraces.  It might report some false positives or known bugs.

Maybe it finds something.  Best we can expect is a panic instead
of hang.  Then show all locks in ddb and trace on all CPU would be
useful.

> >How-To-Repeat:
> 
> Start lots of thread-intensive programs, like pigz(1), nfdump(1), etc.
> I also had a simple C test program using SYSV IPC semaphores running.
> The problem seems to require a reasonable number of CPUs (16 or more)
> to manifest itself.
> 
> >Fix:
> 
> Unknown.
> 
> Thanks,
> --Kor

Reply via email to