Re: Roadmap for 1.6

Willy Tarreau Tue, 29 Jul 2014 01:56:35 -0700

Hi Pavlos,

On Mon, Jul 28, 2014 at 12:07:37AM +0200, Pavlos Parissis wrote:
> On 25/07/2014 07:28 ????, Willy Tarreau wrote:
> > Hi all,
> 
> [..snip..]
> 
> 
> >   - hot reconfiguration : some users are abusing the reload mechanism to
> >     extreme levels, but that does not void their requirements. And many
> >     other users occasionally need to reload for various reasons such as
> >     adding a new server or backend for a specific customer. While in the
> >     past it was not possible to change a server address on the fly, we
> >     could now do it easily, so we could think about provisionning a few
> >     extra servers that could be configured at run time to avoid a number
> >     of reloads. Concerning the difficulty to bind the reloaded processes,
> >     Simon had done some work in this area 3 years ago with the master-
> >     worker model. Unfortunately we never managed to stabilize it because
> >     of the internal architecture that was hard to adapt and taking a lot
> >     of time. It could be one of the options to reconsider though, along
> >     with FD passing across processes. Similarly, persistent server states
> >     across reloads is often requested and should be explored.
> > 
> 
> Let's take this to another level and support on-line configuration
> changes for Frontends, backends and servers which don't require restart


We've already improved things significantly in this direction. We're at a
point where it should be easy to support on-the-fly server address change.
However there are still a large number of things that cannot be easily
changed. All those which have many implications are in this area. For
example, people think that adding a server is easy, but it clearly is not.
The table-based LB algorithms already compute the largest table size when
all servers are up, according to their respective weights. Changing one
weight or adding one server can increase their least common multiple and
require to reallocate and rebuild a complete table. Also, servers are
checked, and for the checks we reserve file descriptors. We cannot easily
change the max number of file descriptors on the fly either. What can be
done however is to reserve some spare slots for adding new servers into an
existing backend.

Also, for having worked many years with various products which support
on-line configuration changes, I don't count anymore the number of days,
weeks or months of troubleshooting of strange issues only caused by side
effect of these on-line changes, that simply went away after a reboot. I'm
not even blaming them because it's very hard to propagate changes correctly.
It always reminds me of a math professor I had at the uni who was able to
spot a mistake in an equation as large as the blackboard, who would fix it
there at the top of the blackboard and propagate the fix down to other lines.
The covered area looked like a pyramid. Here it's the same, performing a
minor change at the top of the configuration needs to take care of many
tiny implications far away from where the change is performed. And I'm
definitely not going to reproduce the lack of reliability that many products
can have just for the sake of allowing on-line reconfiguration.

I'd rather invest more time ensuring that we can seamlessly reload (eg: not
lose stick-tables, stats nor server checks) to ensure that sensible changes
are done this way and not the tricky one.

> and at the same time *dump* the new configuration to haproxy.conf, while
> on startup haproxy.conf.OK was created.

I would love to have this, I've been dreaming about it for about 10 years
in order to ease config migrations. With this we could also get rid of a
number of emulated features. However there are some difficulties caused by
the fact that some features are inherited from the defaults config while
other ones are explicitly present in the section, resulting in 3 possible
output modes :
  - flattened (without defaults anymore)
  - simplified (without everything inherited from defaults)
  - normal : keep everything and only resolve inside a section

For having studied this over a long time, I have an idea how hard of a job
it is. And contrary to a common belief, it's about as hard as parsing the
config.

> The same way OpenLDAP manages
> its configuration. This will be very useful in environments where
> servers register their self to a service(backend in this case) based a
> health-checks which run locally or by a centralized service. Oh yes, I
> am talking about Zookeeper integration.
> 
> In setups where you have N HAProxy servers for serving the same site[1],
> reducing the number of health-checks is very important.

We've been working for a long time on a centralized health-check project
at Exceliance a long time ago, but the amount of possibilities we had in
the health checks at this time resulted in something less capable than
what haproxy could already do (eg: deal with slowstart/soft-stop, report
errors for logs, perform tracking, ...). So we had to go back to the
blackboard after a few months of work, and now the project is almost
burried :-(

> We have been running HAProxy with ~450 backends and ~3000 total servers.
> The number of health-checks was so high that was causing issues on
> Firewalls, oh yes we had Firewalls between HAProxy and servers.

You're not the only one to have firewalls there, don't worry. BTW, the
worst health check place I've seen had 20000 health-checks per second
sent from each haproxy node, and there were *a lot* of nodes. It was
ugly, but I definitely know what you feel like with these numerous checks !

Regards,
Willy

Re: Roadmap for 1.6

Reply via email to