Hi Patrick,
On Tue, Apr 01, 2014 at 03:20:15PM -0400, Patrick Hemmer wrote:
> We have an issue with haproxy (1.5-dev22-1a34d57) where it is
> intermittently not connecting to the backend server. However the
> behavior it is exhibiting seems strange.
> The reason I say strange is that in one example, it logged that the
> client disconnected after ~49 seconds with a connection flags of "CC--".
> However our config has "timeout connect 5000", so it should have timed
> out connecting to the backend server after 5 seconds. Additionally we
> have "retries 3" in the config, so upon timing out, it should have tried
> another backend server, but it never did (the retries counter in the log
> shows "0").
No, retries impacts only retries to the same server, it's "option redispatch"
which allows the last retry to be performed on another server. But you have
it anyway.
> At the time of this log entry, the backend server is responding
> properly. For the ~49 seconds prior to the log entry, the backend server
> has taken other requests. The backend server is also another haproxy
> (same version).
>
> Here's an example of one such log entry:
[fixed version pasted here]
> 198.228.211.13:60848 api~ platform-push/i-84d931a5 49562/0/-1/-1/49563
> 0/0/0/0/0 0/0 691/212 503 CC-- 4F8E-4624 + GET
> /1/sync/notifications/subscribe?sync_box_id=12496&sender=D7A9F93D-F653-4527-A022-383AD55A1943
> HTTP/1.1
OK in fact the client did not wait 49 seconds. If you look closer, you'll
see that the client remained silent for 49 seconds (typically a connection
pool or a preconnect) and closed immediately after sending the request (in
the same millisecond). Since you have "option abortonclose", the connection
was aborted before the server had a chance to respond.
So I can easily imagine that you randomly get this error, you're in a race
condition, if the server responds immediately, you win the race and the
request is handled, otherwise it's aborted.
Please start by removing "option abortonclose", I think it will fix the issue.
Second thing you can do is to remove "option httpclose" or replace it with
"option http-server-close" which is active and not just passive. The connections
will last less time on your servers which is always appreciated.
I'm not seeing any other issue, so with just this you should be fine.
> The log format is defined as:
> %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\
> %U/%B\ %ST\ %tsc\ %ID\ +\ %r
>
> Running a "show errors" on the stats socket did not return any relevant
> results.
>
> Here's the relevant portions of the haproxy config. It is not the entire
> thing as the whole config is 1,513 lines long.
>
> global
> log 127.0.0.1 local0
> maxconn 20480
> user haproxy
> group haproxy
> daemon
> stats socket /var/run/hapi/haproxy/haproxy.sock level admin
>
> defaults
> log global
> mode http
> option httplog
> option dontlognull
> option log-separate-errors
> retries 3
> option redispatch
> timeout connect 5000
> timeout client 60000
> timeout server 170000
> option clitcpka
> option srvtcpka
> option abortonclose
> option splice-auto
> monitor-uri /haproxy/ping
> stats enable
> stats uri /haproxy/stats
> stats refresh 15
> stats auth user:pass
>
> frontend api
> bind *:80
> bind *:443 ssl crt /etc/haproxy/server.pem
> maxconn 20000
> option httpclose
> option forwardfor
> acl internal src 10.0.0.0/8
> acl have_request_id req.fhdr(X-Request-Id) -m found
> http-request set-nice -100 if internal
> http-request add-header X-API-URL %[path] if !internal
> http-request add-header X-Request-Timestamp %Ts.%ms
> http-request add-header X-Request-Id %[req.fhdr(X-Request-Id)] if
> internal have_request_id
> http-request set-header X-Request-Id %{+X}o%pid-%rt if !internal ||
> !have_request_id
> http-request add-header X-API-Host i-4a3b1c6a
> unique-id-format %{+X}o%pid-%rt
> log-format %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\
> %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %U/%B\ %ST\ %tsc\ %ID\ +\ %r
> default_backend DEFAULT_404
>
> acl rewrite-found req.hdr(X-Rewrite-ID,1) -m found
>
> acl nqXn_path path_reg ^/1/sync/notifications/subscribe/([^\ ?]*)$
> acl nqXn_method method OPTIONS GET HEAD POST PUT DELETE TRACE CONNECT
> PATCH
> http-request set-header X-Rewrite-Id nqXn if !rewrite-found nqXn_path
> nqXn_method
> acl rewrite-nqXn req.hdr(X-Rewrite-Id) -m str nqXn
> use_backend platform-push if rewrite-nqXn
> reqrep ^(OPTIONS|GET|HEAD|POST|PUT|DELETE|TRACE|CONNECT|PATCH)\
> /1/sync/notifications/subscribe/([^\ ?]*)([\ ?].*|$) \1\
> /1/sync/subscribe/\2\3 if rewrite-nqXn
>
>
> backend platform-push
> option httpchk GET /ping
> default-server inter 15s fastinter 1s
> server i-6eaf724d 10.230.23.64:80 check observe layer4
> server i-84d931a5 10.230.42.8:80 check observe layer4
>
Regards,
Willy