Hi, I'm using HAProxy 2.0.14 with 8 threads. At modest amounts of regular HTTP traffic (< 10K request per second) I'm noticing infrequent spikes in latency beyond 1 second where the typical response is 10-20ms. I tracked this down to the kernel dropping the SYN packets when the backlog is full (set to the default of 128) causing client to delay and retransmit. I'm running with 8 threads and each thread never consumes more than ~20% CPU. Reducing threads to 4 or increasing to 10 didn't make any difference. If I enable profiling and "show activity" I see something like
thread_id: 3 (1..8) date_now: 1589389307.647623 loops: 10033501 [ 1205167 1326904 1208367 1287760 1191103 1233670 1330192 1250338 ] wake_cache: 6121018 [ 738136 805155 739905 783825 730256 753284 808187 762270 ] wake_tasks: 776106 [ 97666 97331 96197 96812 96564 97435 97536 96565 ] wake_signal: 0 [ 0 0 0 0 0 0 0 0 ] poll_exp: 0 [ 0 0 0 0 0 0 0 0 ] poll_drop: 24795 [ 3162 2921 3219 3091 3253 3075 2987 3087 ] poll_dead: 0 [ 0 0 0 0 0 0 0 0 ] poll_skip: 0 [ 0 0 0 0 0 0 0 0 ] fd_lock: 62727 [ 7218 8514 7635 8552 6940 7743 7950 8175 ] conn_dead: 0 [ 0 0 0 0 0 0 0 0 ] stream: 4465738 [ 562963 557283 555155 557415 558798 558987 559869 555268 ] pool_fail: 0 [ 0 0 0 0 0 0 0 0 ] buf_wait: 0 [ 0 0 0 0 0 0 0 0 ] empty_rq: 1671957 [ 158636 276257 169159 243426 148618 184667 281070 210124 ] long_rq: 709 [ 98 89 86 92 83 75 101 85 ] ctxsw: 21980740 [ 2770297 2741707 2736755 2742773 2751666 2751871 2750818 2734853 ] tasksw: 7271901 [ 914523 908506 905425 907952 909303 910286 911606 904300 ] cpust_ms_tot: 1930 [ 221 306 210 200 224 256 289 224 ] cpust_ms_1s: 54 [ 5 11 5 0 6 5 12 10 ] cpust_ms_15s: 174 [ 20 32 18 1 20 16 37 30 ] avg_loop_us: 21 [ 20 20 22 19 19 22 23 21 ] accepted: 1017514 [ 85086 173212 91525 140507 79202 102570 228342 117070 ] accq_pushed: 1017512 [ 128546 126727 126694 126806 127644 127440 127054 126601 ] accq_full: 2 [ 0 1 0 0 0 1 0 0 ] accq_ring: 0 [ 0 0 0 0 0 0 0 0 ] The output of show info is Name: HAProxy Version: 2.0.14 Release_date: 2020/04/02 Nbthread: 8 Nbproc: 1 Process_num: 1 Pid: 46 Uptime: 0d 0h06m09s Uptime_sec: 369 Memmax_MB: 0 PoolAlloc_MB: 11 PoolUsed_MB: 5 PoolFailed: 0 Ulimit-n: 200190 Maxsock: 200190 Maxconn: 100000 Hard_maxconn: 100000 CurrConns: 120 CumConns: 2880939 CumReq: 5688700 MaxSslConns: 0 CurrSslConns: 0 CumSslConns: 0 Maxpipes: 0 PipesUsed: 0 PipesFree: 0 ConnRate: 0 ConnRateLimit: 0 MaxConnRate: 9411 SessRate: 0 SessRateLimit: 0 MaxSessRate: 9562 SslRate: 0 SslRateLimit: 0 MaxSslRate: 0 SslFrontendKeyRate: 0 SslFrontendMaxKeyRate: 0 SslFrontendSessionReuse_pct: 0 SslBackendKeyRate: 0 SslBackendMaxKeyRate: 0 SslCacheLookups: 0 SslCacheMisses: 0 CompressBpsIn: 0 CompressBpsOut: 0 CompressBpsRateLim: 0 ZlibMemUsage: 0 MaxZlibMemUsage: 0 Tasks: 418 Run_queue: 2 Idle_pct: 100 I noticed that cpust_ms_tot can jump suddenly by 100s. At the same time, I see a large bump in actconn. I've measure the rate of new connections and it seems stable enough, no large spikes that would account for overflowing the socket backlog. I don't see any alert logs. This is not run in a virtualised environment and I've tried both pinning the threads to CPUs and not. I've also tried tuning maxaccept, maxpoll and multi-queue to no avail. I know I can bump somaxconn to absorb these and avoid the retransmission delay but I'd like to understand the root cause of why they're occuring in the first place. Do you know what I could look at to help identify the cause? Thanks Colin