Hi William, On Mon, Oct 21, 2019 at 11:14:20AM +0200, William Dauchy wrote: > On a production environment with a given maxconn, the fd limit can be > successfully set at a given time. While raising maxconn, a new max fd > limit is calculated. If the setrlimit fails (e.g if sysctl fs.nr_open > is lower), we silently revert to the max obtained through getrlimit, > which can be way lower than the previous one set, with a working maxconn; > we only deliver a warning at startup. > In our production environment we rollout new configuration through > progressive reload. If the reload/restart fails (i.e if the health check > fails), the rollout is stopped. Before this patch, the new maxconn is > applied silently whether setrlimit succeed or not; i.e you might easily > face a case where fd limit is reverted to rlim_max making your > production environment unstable very quickly.
Yes, I'm well aware of this impact as well :-/ > That's why we propose to explicitly fail when fd limit cannot be raised > to avoid bad surprises when a new maxconn is set. > We extended the failing case to all setrlimit. This seem to be more in > correlation with other config errors where haproxy immediately fails. To be honest, I'm quite embarrassed by such a change. I think the main reason for the initial warning instead of error was that most people using haproxy in unprivileged environments never knew what to use as a ulimit-n or maxconn value, they only needed something that worked fine for small traffic. That's also how developers use it by the way, each time I start a config with explicit settings, I receive this warning and it allows me to test it anyway. My concern is that this will cause a large number of deployed system to simply stop working. On the other hand, we do have automatic settings now (at least for the FD) so it is easier for such users to simply drop the maxconn line and be fine with it. Probably that a good approach would be to have a new directive (or set of directives) such as "strict-limits" which would indicate if we need to warn or fail. We could then decide to let it warn by default and change it to strict mode by default later. Leaving at least one round of LTS with a warning indicating that this needs to be fixed or it will fail in the next version will be OK. This basically means keeping the default to relaxed until 2.2 and make it strict by default in 2.3. I suspect that we should set the memory behavior separately, but at the same time I don't recall having ever seen this one. Thus maybe we could already make it strict by default. Anyway I think you get the general point, warn users first that it will break next time, then break by default. It will leave them with enough time to fix their config or report their constraints. Just let me know if that's a good enough plan for you. Cheers, Willy