Very interesting, thanks for the tip. I only see two requests there, one of
which seems like nonsense or a vulnerability scan
("\r\n\r\n\x00\x00\x00...."), and the other has a space in the path that's
being requested due to improper escaping. Neither of those is a huge deal
to me, though if the downstream server (nginx) would handle the space then
I suppose I'd want to use the accept-invalid-request option.

I finally captured some 504s in the debug logging. 129 since yesterday
afternoon. They all seem to look like this:
Mar 30 14:46:19.000 haproxy-k49 haproxy[19450]: x.x.x.x:49638
[30/Mar/2014:14:45:19.533] frontend_https~ tapp_http/tapp-m2t
77/0/4/60000/60081 504 343 - - ---- 1255/1255/17/4/0 0/0 "GET /data/?a=b
HTTP/1.1"

I'm guessing that the 60000/60081 means that 60s is some timeout threshold,
and 60.081 seconds were reached, which caused the 504. Is that correct? I
am also guessing that this is caused by slowness on the downstream
application servers. I do see a spike in the number of requests at the same
times as these 504s, and I suspect adding more downstream servers (with
balance leastconn) will help here.

I'm a little confused by the 60s timeout, though. I have these (below) in
my config, so I'd expect the the timeout that's causing the 504 is the
"timeout server", which isn't 60s.

defaults
  log global
  option httplog
  retries 3
  option redispatch
  maxconn 8196
  timeout connect 20s
  timeout client 80s
  timeout server 80s
  timeout http-keep-alive 50s
  stats enable
  stats auth user:password
  stats uri /stats

Is my 80s server timeout not taking for some reason, or is the 60s some
other setting? I'm mostly just curious, since 80s was artificially high
(while I try to address these problem), and I'll probably lower it to 50s
before I'm done.

Thanks,
Patrick


On Sun, Mar 30, 2014 at 2:22 PM, Baptiste <[email protected]> wrote:

> Hi Patrick,
>
> Just issue a 'show errors' on HAProxy stats socket and you'll know why
> these request have been denied.
> You can also give a try to the 'option accept-invalid-request' to tell
> haproxy be less sensitive on HTTP checking...
>
> Baptiste
>
>
> On Sat, Mar 29, 2014 at 9:37 PM, Patrick Schless
> <[email protected]> wrote:
> > sorry, sent that before it was ready. here's the complete message:
> >
> >
> > Some more info:
> >
> > I am getting reports from users of high numbers of 504s.
> >
> > My timeouts are pretty high (while trying to debug this problem), so it
> > doesn't seem like they are the issue:
> >   timeout connect 20s
> >   timeout client 80s
> >   timeout server 80s
> >   timeout http-keep-alive 50s
> >
> > I have http logging on, but I am not seeing any 5xx responses (almost all
> > 200s, with a low number of 4xx, which seems about right).
> >
> > I am tracking the count of all status codes, and it seems that the ereq
> > count tracks pretty closely to the number of 400s that are in the haproxy
> > log (though the logs have a couple more 400s than are reported by stats).
> >
> > Logs for these 400s look like one of three things:
> > 1: Mar 29 15:13:42.000 haproxy-k49 haproxy[19450]: xx.xx.xx.xx:45381
> > [29/Mar/2014:15:13:42.181] frontend_https~ frontend_https/<NOSRV>
> > -1/-1/-1/-1/46 400 187 - - CR-- 1031/1031/0/0/0 0/0 "<BADREQ>"
> >
> > 2: Mar 29 15:13:41.000 haproxy-k49 haproxy[19450]: xx.xx.xx.xx:38440
> > [29/Mar/2014:15:13:40.874] frontend_https~ tapp_http/tapp-p8b
> 394/0/1/29/424
> > 400 183 - - ---- 1046/1046/23/9/0 0/0 "GET
> >
> /v1/data/?key=k1&interval=1min&function=mean&start=2014-03-29T20%3A13%3A44.000Z&end=2014-03-29T20%3A13%3A40.623Z
> > HTTP/1.1"
> >
> > 3: Mar 29 15:09:19.000 haproxy-k49 haproxy[19450]: xx.xx.xx.xx:51969
> > [29/Mar/2014:15:09:19.213] frontend_https~ frontend_https/<NOSRV>
> > -1/-1/-1/-1/118 400 187 - - PR-- 1087/1087/0/0/0 0/0 "<BADREQ>"
> >
> > These are spread across a variety of customers and don't seem related to
> SSL
> > (since some of the errors are on the http frontend). The counts for the
> > various types of 400s are here:
> > [patrick@haproxy-k49 ~]$ sudo grep haproxy /var/log/messages | grep -E
> > "[0-9] 400 [0-9]" | awk '{print $6 " " $9 " " $11 " " $15}' | sed
> > s/:[0-9]*// | sed s/tapp-.../tapp-abc/ | sort | uniq -c | sed
> > s/[0-9][0-9][0-9]\\?\\./x./g
> >      37 x.x.x.245 tapp_http/tapp-abc 400 ----
> >      12 x.x.x.25 frontend_http/<NOSRV> 400 CR--
> >    1182 x.x.x.35 frontend_https/<NOSRV> 400 CR--
> >       1 x.x.x.94 frontend_http/<NOSRV> 400 CR--
> >      35 x.x.x.65 tapp_http/tapp-abc 400 ----
> >       8 x.x.x.29 frontend_http/<NOSRV> 400 PR--
> >      89 x.x.x.96 frontend_https/<NOSRV> 400 PR--
> >
> >
> > My guess is that requests like (2) are the ones that end up as 400s but
> > don't register as ereq's (just do to the low frequency of them).
> >
> > The lines like (1) (the CR lines) I'm assuming as premature closes by the
> > client, and there's maybe nothing I can do about that.
> >
> > For lines like (3) (the PR lines), I don't understand why the proxy is
> > denying them. Is there anyway to see exactly what is being sent for these
> > connections?
> >
> > Thanks,
> > Patrick
> >
> >
> > On Sat, Mar 29, 2014 at 3:12 PM, Patrick Schless <
> [email protected]>
> > wrote:
> >>
> >> Some more info:
> >>
> >> I am getting reports from users of high numbers of 504s.
> >>
> >> My timeouts are pretty high (while trying to debug this problem), so it
> >> doesn't seem like they are the issue:
> >>   timeout connect 20s
> >>   timeout client 80s
> >>   timeout server 80s
> >>   timeout http-keep-alive 50s
> >>
> >> I have http logging on, but I am not seeing any 5xx responses (almost
> all
> >> 200s, with a low number of 4xx, which seems about right).
> >>
> >>
> >> On Fri, Mar 28, 2014 at 7:23 PM, Patrick Schless
> >> <[email protected]> wrote:
> >>>
> >>> I am running on 1.5 dev22, and doing SSL termination. Traffic seems to
> be
> >>> handled fine, but my ereq is steadily rising. Poking at the source, it
> looks
> >>> like this can be caused by a number of different errors.
> >>>
> >>> What's the next step for trying to determine what's causing these? I
> >>> tried bumping my connect and cli timeouts, but that didn't change
> anything.
> >>>
> >>>
> >>> Thanks,
> >>> Patrick
> >>
> >>
> >
>

Reply via email to