Hi William,

On Mon, Mar 19, 2018 at 06:57:50PM +0100, William Dauchy wrote:
> > However, be careful. This new implementation should be thread-safe
> > (hopefully...). But it is not optimal and in some situations, it could be 
> > really
> > slower in multi-threaded mode than in single-threaded one. The problem is 
> > that,
> > when we try to dequeue pending connections, we process it from the older 
> > one to
> > the newer one independently to the thread's affinity. So we need to wait the
> > other threads' wakeup to really process them. If threads are blocked in the
> > poller, this will add a significant latency. This problem happens when 
> > maxconn
> > values are very low.
> 
> Regarding this last section, we are a bit worried about the usability
> of the new `nbthread` feature in 1.8. It raised a few question on our
> side:
> - Is it considered as an experimental feature?

Threading was clearly released with an experimental status, just like
H2, because we knew we'd be facing some post-release issues in these
two areas that are hard to get 100% right at once. However I consider
that the situation has got much better, and to confirm this, both of
these are now enabled by default in HapTech's products. With this said,
I expect that over time we'll continue to see a few bugs, but not more
than what we're seeing in various areas. For example, we didn't get a
single issue on haproxy.org since it was updated to the 1.8.1 or so,
3 months ago. So this is getting quite good.

> - Should we expect potential latencies side effects in some situations
> as described in your commit message? (and so avoid to use it for low
> latency usage)

I ran a stress test on this patch, with a single server running with
"maxconn 1", with a frontend bound to two threads. I measure exactly
30000 conn/s with a single thread (keep in mind that there's a single
connection at once), and 28500 with two threads. Thus the sync point
takes on average an extra 1.75 microsecond, compared to the 35
microseconds it takes on average to finish processing the request
(connect, server processing, response, close).

Also if you're running with nbproc > 1 instead, the maxconn setting is
not really respected since it becomes per-process. When you run with
8 processes it doesn't mean much anymore, or you need to have small
maxconn settings, implying that sometimes a process might queue some
requests while there are available slots in other processes. Thus I'd
argue that the threads here significantly improve the situation by
allowing all connection slots to be used by all CPUs, which is a real
improvement which should theorically show you lower latencies.

Note that if this is of interest to you, it's trivial to make haproxy
run in busy polling mode, and in this case the performance increases to
30900 conn/s, at the expense of eating all your CPU (which possibly you
don't care about if the latency is your worst ennemy). We can possibly
even improve this to ensure that it's done only when there are existing
sessions on a given thread. Let me know if this is something that could
be of interest to you, as I think we could make this configurable and
bypass the sync point in this case.

> - I saw some commits for 1.9 release which probably improves the
> situation about lockess threading
> http://git.haproxy.org/?p=haproxy.git;a=commit;h=cf975d46bca2515056a4f55e55fedbbc7b4eda59
> http://git.haproxy.org/?p=haproxy.git;a=commit;h=4815c8cbfe7817939bcac7adc18fd9f86993e4fc
> But I guess they will not be backported for 1.8, right?

No they're definitely not for 1.8 and still really touchy. We're
progressively attacking locks wherever we can. Some further patches
will refine the scheduler to make it more parallel, and even the code
above will continue to change, see it as a first step in the right
direction.

We noticed a nice performance boost on the last one with many cores
(24 threads, something like +40% on connection rate), but we'll probably
see even better once the rest is addressed.

Cheers,
Willy

Reply via email to