Hi Lukas. Am 23.01.2019 um 10:24 schrieb Luke Seelenbinder: > Hi Willy, > > Thanks for continuing to look into this. > >> > >> I've place an nginx instance after my local haproxy dev config, and >> found something which might explain what you're observing : the process >> apparently leaks FDs and fails once in a while, causing 500 to be returned : > > That's fascinating. I would have thought nginx would have had a bit better > care given to things like that. . .
This can be fixed with increasing the ulimits ;-). > Oddly enough, I cannot find any log entries that approximate this. However, > it's possible since we're primarily (99+%) using nginx as a reverse-proxy > that the fd issues wouldn't appear for us. What's your ulimit for nginx process? > My next thought is to try tcpdump to try to determine what's on the wire when > the CD-- and SD-- pairs appear, but since our stack is SSL e2e, that might > prove difficult. Any suggestions? If you have enough log space you can try to activate debug log in nginx and haproxy. https://nginx.org/en/docs/debugging_log.html https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#log => debug This will have some impacts on the performance as every request creates a lot of loglines! It would be interesting which error you have in the nginx log when the CD/SD happen as the 'http2 flood detected' is not in the logs. Which release of nginx do you use? http://hg.nginx.org/nginx/tags Maybe there are some errors in the log which can be found in this directory. http://hg.nginx.org/nginx/file/release-1.15.8/src/http/v2/ > One more interesting piece of data: if we use htx without h2 on the backends, > we only see CD-- entries consistently (with a very, very few SD-- entries). > Thus, it would seem whatever is causing the issue is directly related to h2 > backends. I further think we can safely say it is directly related to h2 > streams breaking (due to client-side request cancellations) resulting in the > whole connection breaking in HAProxy or nginx (though determining which will > be the trick). > > There's also a strong possibility we replace nginx with HAProxy entirely for > our SSL + H2 setup as we overhaul the backends, so this problem will probably > be resolved by removing the problematic interaction. What was the main reason to use the nginx between the haproxy and backends? What's the backends? Regards Aleks > I'm still working on running h2load against our nginx servers to see if that > turns anything up. > >> And at this point the connection is closed and reopened for new requests. >> There's never any GOAWAY sent. > > If I'm understanding this correctly, that implies as long as nginx sends > GOAWAY properly, HAProxy will not attempt to reuse the connection? > >> I managed to work around the problem by limiting the number of total >> requests per connection. I find this extremely dirty but if it helps... >> I just need to figure how to best do it, so that we can use it as well >> for H2 as for H1. > > We're pretty satisfied with our h2 fe <-> be h1.1 setup right now, so we will > probably stick with that for now, since we don't want to have any more > operational issues from bleeding-edge bugs. (Not a comment on HAProxy, per > se, just a business reality. :-) ) I'm more than happy to try out anything > you turn up on our staging setup! > > Best, > Luke > > > — > Luke Seelenbinder > Stadia Maps | Founder > stadiamaps.com > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Wednesday, January 23, 2019 8:28 AM, Willy Tarreau <w...@1wt.eu> wrote: > >> Hi Luke, >> > >> I've place an nginx instance after my local haproxy dev config, and >> found something which might explain what you're observing : the process >> apparently leaks FDs and fails once in a while, causing 500 to be returned : >> > >> 2019/01/23 08:22:13 [crit] 25508#0: *36705 open() >> "/usr/local/nginx/html/index.html" failed (24: Too many open files), client: >> 1> >> 2019/01/23 08:22:13 [crit] 25508#0: accept4() failed (24: Too many open >> files) >> > >> 127.0.0.1 - - [23/Jan/2019:08:22:13 +0100] "GET / HTTP/2.0" 500 579 "-" >> "Mozilla/4.0 (compatible; MSIE 7.01; Windows)" >> > >> The ones are seen by haproxy : >> > >> 127.0.0.1:47098 [23/Jan/2019:08:22:13.589] decrypt trace/ngx 0/0/0/0/0 500 >> 701 - - ---- 1/1/0/0/0 0/0 "GET / HTTP/1.1" >> > >> And at this point the connection is closed and reopened for new requests. >> There's never any GOAWAY sent. >> > >> I managed to work around the problem by limiting the number of total >> requests per connection. I find this extremely dirty but if it helps... >> I just need to figure how to best do it, so that we can use it as well >> for H2 as for H1. >> > >> Best regards, >> Willy >