Hi, i have a problem with worker high CPU usage on recent Rocky Linux 9 with nginx setup.
I have 10 identical servers, each with Mellanox ConnectX-5 dual 25Gbps NIC, 6x Samsung PM983 NVMe drives, 512GB RAM and Epyc 7402 CPU. 9 of them are using RockyLinux 8.7 (some with stock, some with 5.18.10-1.el8.elrepo.x86_64 kernel) with nginx and passing ~20Gbps of SSL/http2 traffic using 24workers and reuseport config option. Each worker have about 10-12% CPU usage and nginx worker strace looks like this (normal): strace: Process 73930 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 72.59 0.018036 1 9329 195 write 15.61 0.003879 9 400 io_submit 5.70 0.001416 2 681 epoll_wait 2.93 0.000729 1 569 88 read 1.81 0.000449 1 394 io_getevents 0.81 0.000202 3 51 sendto 0.52 0.000129 1 106 fcntl 0.02 0.000005 5 1 shutdown 0.01 0.000002 1 2 epoll_ctl 0.00 0.000000 0 5 getpid 0.00 0.000000 0 1 recvfrom 0.00 0.000000 0 1 setsockopt 0.00 0.000000 0 1 accept4 ------ ----------- ----------- --------- --------- ---------------- 100.00 0.024847 2 11541 283 total On 10th server we are now testing Rocky Linux 9 configuration and each worker is using about 60-80% of CPU with following strace from nginx worker: strace: Process 1966 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 88.44 0.461237 1130 408 io_submit 10.68 0.055674 2 18658 147 write 0.25 0.001283 1 655 191 read 0.20 0.001033 2 435 epoll_wait 0.18 0.000914 5 156 fcntl 0.08 0.000420 1 306 futex 0.07 0.000378 5 65 sendto 0.07 0.000348 1 271 io_getevents 0.02 0.000085 9 9 openat 0.01 0.000037 12 3 accept4 0.01 0.000034 8 4 close 0.00 0.000023 2 9 newfstatat 0.00 0.000021 0 23 getpid 0.00 0.000015 2 6 setsockopt 0.00 0.000008 1 6 epoll_ctl 0.00 0.000006 2 3 recvfrom ------ ----------- ----------- --------- --------- ---------------- 100.00 0.521516 24 21017 338 total As you can see, it is spending a lot of time in io_submit. It has 1130 usec/call while RL8 servers have only 9 usec/call in io_submit. Only thing that has changed is RL8->RL9. All servers have identical hardware. All servers are serving almost identical amount of traffic ~19-20Gbps from identical files with identical number of connections. Iostat on all servers shows about 50% util on each drive, similar rKB/s, r/s and rareq-sz so disk IO should not be a problem. nginx-1.22.1 compiled with: ./configure --prefix=/usr/local/nginx --with-http_mp4_module --with-http_secure_link_ module --with-http_stub_status_module --with-http_ssl_module --with-http_v2_module --with-pcre --with-file-aio --with-threads --with-cc-opt=' -DTCP_FASTOPEN=23' --with-http_sub_module Important parts of nginx.conf: worker_processes 24; worker_cpu_affinity auto; worker_rlimit_nofile 81920; events { worker_connections 2000; use epoll; } sendfile off; aio on; directio 4096; directio_alignment 4k; tcp_nodelay on; Any ideas what could be causing the problem? _______________________________________________ nginx mailing list -- nginx@nginx.org To unsubscribe send an email to nginx-le...@nginx.org