Hi,

I'm using HAProxy 2.0.14 with 8 threads. At modest amounts of regular HTTP
traffic (< 10K request per second) I'm noticing infrequent spikes in
latency beyond 1 second where the typical response is 10-20ms. I tracked
this down to the kernel dropping the SYN packets when the backlog is full
(set to the default of 128) causing client to delay and retransmit. I'm
running with 8 threads and each thread never consumes more than ~20% CPU.
Reducing threads to 4 or increasing to 10 didn't make any difference. If I
enable profiling and "show activity" I see something like


thread_id: 3 (1..8)
date_now: 1589389307.647623
loops: 10033501 [ 1205167 1326904 1208367 1287760 1191103 1233670 1330192
1250338 ]
wake_cache: 6121018 [ 738136 805155 739905 783825 730256 753284 808187
762270 ]
wake_tasks: 776106 [ 97666 97331 96197 96812 96564 97435 97536 96565 ]
wake_signal: 0 [ 0 0 0 0 0 0 0 0 ]
poll_exp: 0 [ 0 0 0 0 0 0 0 0 ]
poll_drop: 24795 [ 3162 2921 3219 3091 3253 3075 2987 3087 ]
poll_dead: 0 [ 0 0 0 0 0 0 0 0 ]
poll_skip: 0 [ 0 0 0 0 0 0 0 0 ]
fd_lock: 62727 [ 7218 8514 7635 8552 6940 7743 7950 8175 ]
conn_dead: 0 [ 0 0 0 0 0 0 0 0 ]
stream: 4465738 [ 562963 557283 555155 557415 558798 558987 559869 555268 ]
pool_fail: 0 [ 0 0 0 0 0 0 0 0 ]
buf_wait: 0 [ 0 0 0 0 0 0 0 0 ]
empty_rq: 1671957 [ 158636 276257 169159 243426 148618 184667 281070 210124
]
long_rq: 709 [ 98 89 86 92 83 75 101 85 ]
ctxsw: 21980740 [ 2770297 2741707 2736755 2742773 2751666 2751871 2750818
2734853 ]
tasksw: 7271901 [ 914523 908506 905425 907952 909303 910286 911606 904300 ]
cpust_ms_tot: 1930 [ 221 306 210 200 224 256 289 224 ]
cpust_ms_1s: 54 [ 5 11 5 0 6 5 12 10 ]
cpust_ms_15s: 174 [ 20 32 18 1 20 16 37 30 ]
avg_loop_us: 21 [ 20 20 22 19 19 22 23 21 ]
accepted: 1017514 [ 85086 173212 91525 140507 79202 102570 228342 117070 ]
accq_pushed: 1017512 [ 128546 126727 126694 126806 127644 127440 127054
126601 ]
accq_full: 2 [ 0 1 0 0 0 1 0 0 ]
accq_ring: 0 [ 0 0 0 0 0 0 0 0 ]


The output of show info is

Name: HAProxy
Version: 2.0.14
Release_date: 2020/04/02
Nbthread: 8
Nbproc: 1
Process_num: 1
Pid: 46
Uptime: 0d 0h06m09s
Uptime_sec: 369
Memmax_MB: 0
PoolAlloc_MB: 11
PoolUsed_MB: 5
PoolFailed: 0
Ulimit-n: 200190
Maxsock: 200190
Maxconn: 100000
Hard_maxconn: 100000
CurrConns: 120
CumConns: 2880939
CumReq: 5688700
MaxSslConns: 0
CurrSslConns: 0
CumSslConns: 0
Maxpipes: 0
PipesUsed: 0
PipesFree: 0
ConnRate: 0
ConnRateLimit: 0
MaxConnRate: 9411
SessRate: 0
SessRateLimit: 0
MaxSessRate: 9562
SslRate: 0
SslRateLimit: 0
MaxSslRate: 0
SslFrontendKeyRate: 0
SslFrontendMaxKeyRate: 0
SslFrontendSessionReuse_pct: 0
SslBackendKeyRate: 0
SslBackendMaxKeyRate: 0
SslCacheLookups: 0
SslCacheMisses: 0
CompressBpsIn: 0
CompressBpsOut: 0
CompressBpsRateLim: 0
ZlibMemUsage: 0
MaxZlibMemUsage: 0
Tasks: 418
Run_queue: 2
Idle_pct: 100



I noticed that cpust_ms_tot can jump suddenly by 100s. At the same time, I
see a large bump in actconn. I've measure the rate of new connections and
it seems stable enough, no large spikes that would account for overflowing
the socket backlog. I don't see any alert logs. This is not run in a
virtualised environment and I've tried both pinning the threads to CPUs and
not. I've also tried tuning maxaccept, maxpoll and multi-queue to no avail.
I know I can bump somaxconn to absorb these and avoid the retransmission
delay but I'd like to understand the root cause of why they're occuring in
the first place.
Do you know what I could look at to help identify the cause?

Thanks
Colin

Reply via email to