Hi Frank,

On Fri, Feb 23, 2018 at 10:28:15AM +0000, Frank Schreuder wrote:
> > A few more things on the core dumps :
> >  - they are ignored if you have a chroot statement in the global section
> >  - you need not to use "user/uid/group/gid" otherwise the system also
> >    disables core dumps
> 
> I'm using chroot and user/group in my config, so I'm not able to share core 
> dumps.

Well, if at least you can attach gdb to a process hoping to see it stop and emit
"bt full" to see the whole backtrace, it will help a lot.

> > There are very few abort() calls in the code :
> >  - some in the thread debugging code to detect recursive locks ;
> >  - one in the cache applet which triggers on an impossible case very
> >    likely resulting from cache corruption (hence a bug)
> >  - a few inside the Lua library
> >  - a few in the HPACK decompressor, detecting a few possible bugs there
> >
> > Except for Lua, all of them were added during 1.8, so depending on what the
> > configuration uses, there are very few possible candidates.
> 
> I added my configuration in this mail. Hopefully this will narrow down the
> possible candidates.

Well, at least you don't use threads nor lua nor caching nor HTTP/2 so
it cannot come from any of those we have identified. It could still come
from openssl however.

> I did some more research to the memory warnings we encounter every few days.
> It seems like the haproxy processes use a lot of memory. Would haproxy with
> nbthreads share this memory?

It depends. In fact, the memory will indeed be shared between threads
started together, but if this memory is consumed at load time and never
modified, it's also shared between the processes already.

>  1160 haproxy   20   0 1881720 1.742g   5504 S  83.9 11.5   1:53.38 haproxy
>  1045 haproxy   20   0 1880120 1.740g   5572 S  71.0 11.5   1:36.62 haproxy
>  1104 haproxy   20   0 1880376 1.741g   6084 R  64.6 11.5   1:46.29 haproxy
>  1079 haproxy   20   0 1881116 1.741g   5564 S  58.1 11.5   1:42.29 haproxy
>  1135 haproxy   20   0 1881240 1.741g   5564 S  58.1 11.5   1:49.85 haproxy
>    995 haproxy   20   0 1881852 1.742g   5584 R  38.7 11.5   1:30.05 haproxy
>  1020 haproxy   20   0 1881448 1.741g   5516 S  25.8 11.5   1:32.20 haproxy
>  4926 haproxy   20   0 1881008 1.718g   2176 S   6.5 11.3   3:11.74 haproxy
>  8526 haproxy   20   0 1878032   6516   1304 S   0.0  0.0   2:10.04 haproxy
>  8529 haproxy   20   0 1880336   5208      4 S   0.0  0.0   2:34.68 haproxy
> 11530 haproxy   20   0 1878748   6556   1392 S   0.0  0.0   2:25.94 haproxy
> 26938 haproxy   20   0 1882592   6032    892 S   0.0  0.0   3:56.79 haproxy
> 29577 haproxy   20   0 1880480 1.738g   3132 S   0.0 11.5   2:08.74 haproxy
> 31124 haproxy   20   0 1880776 1.740g   4284 S   0.0 11.5   2:58.84 haproxy
>   7548 root      20   0 1869896 1.731g   4456 S   0.0 11.4   1008:23 haproxy
> 
> I'm using systemd to reload haproxy for new SSL certificates every few 
> minutes.

OK. I'm seeing that you load certs from a directory in your config. Do you
have a high number of certs ? I'm asking because we've already seen some
configs eating multiple gigs of RAM with the certs because there were a lot.

In your case they're loaded twice (one for the IPv4 bind line, one for the
IPv6). William planned to work on a way to merge all identical certs and have
a single instance of them when loaded multiple times, which should already
reduce the amount of memory consumed by this.

> Configuration:
(...)
> defaults
>     log global
>     timeout http-request 5s
>     timeout connect      2s
>     timeout client       125s
>     timeout server       125s
>     mode http
>     option dontlog-normal
>     option http-server-close
      ^^^^^^^^^^^^^^^^^^^^^^^^
It is very likely that you don't need this one anymore, and can improve your
server's load by using keep-alive between haproxy and the backend servers.
But that's irrelevant to your current problem.

(...)
> frontend fe_http
>     bind ipv4@:80 backlog 65534
>     bind ipv6@:80 backlog 65534
>     bind ipv4@:443 ssl crt /etc/haproxy/ssl/invalid.pem crt /etc/haproxy/ssl/ 
> crt /etc/haproxy/customer-ssl/ strict-sni backlog 65534
>     bind ipv6@:443 ssl crt /etc/haproxy/ssl/invalid.pem crt /etc/haproxy/ssl/ 
> crt /etc/haproxy/customer-ssl/ strict-sni backlog 65534
>     bind-process 1-7
>     tcp-request inspect-delay 5s
>     tcp-request content accept if { req_ssl_hello_type 1 }

This one is particularly strange. I suspect it's a leftover from an old
configuration dating from the days where haproxy didn't support SSL,
because it's looking for SSL messages inside the HTTP traffic, which
will never be present. You can safely remove tose two lines.

>     option forwardfor
>     acl secure dst_port 443
>     acl is_acme_request path_beg /.well-known/acme-challenge/
>     reqadd X-Forwarded-Proto:\ https if secure
>     default_backend be_reservedpage
>     use_backend be_acme if is_acme_request
>     use_backend 
> %[req.fhdr(host),lower,map_dom(/etc/haproxy/domain2backend.map)]
>     compression algo gzip
>     maxconn 32000

In my opinion its not a good idea to let a single frontend steal all the
process's connections, it will prevent you from connecting to the stats
page when a problem happens. You should have a slightly larger global
maxconn setting to avoid this.

> backend be_acme
>     bind-process 1
>     option httpchk HEAD /ping.php HTTP/1.1\r\nHost:\ **removed hostname**
>     option http-server-close
>     option http-pretend-keepalive

Same comment here as above regarding close vs keep-alive.

Aside this I really see nothing suspicious in your configuration that could
justify a problem. Let's hope you can at least either catch a core or attach
a gdb to one of these processes.

Regards,
Willy

Reply via email to