Re: Haproxy 1.5.4 unable to accept new TCP request, backlog full, tens of thousands close_wait connection

Willy Tarreau Tue, 25 Apr 2017 02:24:08 -0700

On Tue, Apr 25, 2017 at 05:00:48PM +0800, jaseywang wrote:
> Here is the data with debug mode off, still the same issue:
> https://www.dropbox.com/s/4x0cjfv1o2kmwg3/analytics-debug-off.txt?dl=0
> 
> 
> Flat profile:
> 
> Each sample counts as 0.01 seconds.
>  no time accumulated
> 
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  Ts/call  Ts/call  name
>   0.00      0.00     0.00   264321     0.00     0.00  hdr_idx_add
>   0.00      0.00     0.00   189187     0.00     0.00  strlcpy2
>   0.00      0.00     0.00   155794     0.00     0.00  ultoa_o
>   0.00      0.00     0.00   144666     0.00     0.00  ltoa_o
>   0.00      0.00     0.00   122408     0.00     0.00  http_find_header2
>   0.00      0.00     0.00    83596     0.00     0.00  fd_update_cache
>   0.00      0.00     0.00    83336     0.00     0.00  http_sync_req_state
>   0.00      0.00     0.00    78736     0.00     0.00  eb_delete
>   0.00      0.00     0.00    78198     0.00     0.00  eb32_insert
>   0.00      0.00     0.00    78043     0.00     0.00  conn_update_data_polling
>   0.00      0.00     0.00    67163     0.00     0.00  conn_fd_handler
>   0.00      0.00     0.00    66824     0.00     0.00  vars_init
>   0.00      0.00     0.00    66768     0.00     0.00  utoa_pad
>   0.00      0.00     0.00    61080     0.00     0.00  http_resync_states
>   0.00      0.00     0.00    61080     0.00     0.00  http_sync_res_state
(...)


For me this shows a perfectly healthy load balancer experiencing a very
low load. There's even no connection retries, everything looks OK. I'm
speechless.

Given that no time was reported in any function, it could be possible
that all the time is spent in SSL handshakes but I'm even having doubts
on this now.

Do you see some CPU usage on the machine during the test ? Is the CPU
where haproxy runs saturated ?

The next step will probably require strace to see where the time is wasted.
You can run it like this :

   strace -Tttvs200 -o strace.log -p $(pidof haproxy)

It will reveal the time spent in each syscall and between them. We may find
that a full millisecond is lost somewhere, helping to narrow the problem down.

Willy

Re: Haproxy 1.5.4 unable to accept new TCP request, backlog full, tens of thousands close_wait connection

Reply via email to