Re: Performance of concurrent connections decreases from 1.5-dev7 to 1.5-dev17

Willy Tarreau Sat, 06 Jul 2013 00:02:19 -0700

Hi Godbach,

On Sat, Jul 06, 2013 at 02:14:53PM +0800, Godbach wrote:
> I have tested dev17 again without the commit 6e0644339f3b and used
> MALLOC in pool_refill_alloc() instead of CALLOC.
> 
> The result is nearly the same as that in dev7.

Great, thank you for confirming. Again, you must keep in mind that
it does not mean haproxy uses more memory on new versions, but that
it uses the memory it is supposed to be using when the traffic is
high.

> On 2013/7/6 5:08, Willy Tarreau wrote:
> 
> > Oh yes I got it now. That's because we now use calloc instead of malloc
> > during the initialization of the structs in order to guarantee that we
> > won't have any platform-specific behaviour anymore with regards to the
> > struct contents upon startup. I remember seeing some bugs that could be
> > reproduced on uclibc only and not glibc, just because the former passed
> > us some buffers that had previously been used while the later gave us a
> > clean one. This difference in behaviour made the bugs much harder to
> > diagnose.
> 
> Did you mean that there are some bugs which cannot be reproduced in
> glibc may be caused with memory allocated by malloc. Is there some
> potential bugs can be expected if I use MALLOC in dev17 again?

No, that's not the case. But sometimes we've had bugs with new features
that were added and for which we forgot to initialize a pointer. These
bugs only appeared in some environments, because some libc would provide
you with a dirty buffer and others with a clean one. The clean one would
hide the bug and make it hard to diagnose. So now we ensure that we don't
rely on the libc behaviour, and we even have the ability to put some dirt
in the buffer before delivering it (-dM) to make these bugs appear earlier.

BTW, dev17 had some bugs which were fixed in dev18 and dev19. Right now,
dev19 is considered safer even if younger.

> > So in fact with or without poisonning, we fully initialize all structs
> > now, whether or not the requests make use of them. That also means that
> > my occasional tests at full load are useless since the memory is really
> > allocated :-)
> 
> Yes, once intialized, the memory wil be fully held by haproxy. As memory
> mentioned, it seems that haproxy should present a mechnism to free the
> memory unused for a long time to save memory for system.

It has this, when you send a SIGQUIT (I believe), or when it does a soft
stop, it releases everything it can. However, I absolutely refuse to see
it release the memory it doesn't use, because :
  - if it managed to use this memory, it means it can need it you know
    it may need it again (otherwise you'd have reduced maxconn, no?)

  - if it releases memory there is no guarantee it can get it again : if
    no other process needs the memory, there is no point releasing it. If
    other processes need it, then it might be missing for when the load
    comes in.

  - with most libc, releasing memory does nothing until the top of the
    heap is effectively released, which you never know. That's why in
    production, I use dlmalloc instead. It makes use of mmap, and when
    doing a free(), you just punch holes everywhere in the memory space
    and you really release memory. That's useful when the process is
    being replaced by a new one.

> Otherwise
> haproxy will hold the memory all the time expect exitting or receiving
> QUIT signal.

Yes and that's on purpose. I remember here on this list, someone asked
if we could implement a "safe" option to preallocate the maximum memory
needed before opening the service, in order to guarantee that resources
will be available. We've not implemented it, but I've already done this
during debugging sessions.

> > We've already seen some people reaching the million of concurrent
> > connections at least in benchmarks, and I remember about 400k in
> > production.  So per-connection memory usage is critical in such
> > environments.
> > 
> In my test, with 8K buffer and 3.4G freed memory, the amount of
> concurrent connections can be reached more than 150k.

You need to be careful about the socket buffers as well. I tend to
suggest this allocating memory this way :
  - 1/3 for haproxy
  - 1/3 for socket buffers
  - 1/3 for the rest of the system and to help soft reloads

So when you have a 8GB machine, you can assign 2.6 GB to haproxy. With
8kB buffers, you have around 17kB per session (18kB with HTTP logs).

2.6GB/17kB = 160k sessions. The socket buffers will allocate a minimum
of 4kB per side and per direction, so 16kB per session, or 2.5 GB min.

Given the cost of memory nowadays, if you're running a site which needs
to support 150k concurrent connections, I'd suggest not being cheap and
using a server with more than 8GB of memory to avoid the burden of tuning
the system too tightly.

Best regards,
Willy

Re: Performance of concurrent connections decreases from 1.5-dev7 to 1.5-dev17

Reply via email to