Hi William, On Mon, Mar 19, 2018 at 06:57:50PM +0100, William Dauchy wrote: > > However, be careful. This new implementation should be thread-safe > > (hopefully...). But it is not optimal and in some situations, it could be > > really > > slower in multi-threaded mode than in single-threaded one. The problem is > > that, > > when we try to dequeue pending connections, we process it from the older > > one to > > the newer one independently to the thread's affinity. So we need to wait the > > other threads' wakeup to really process them. If threads are blocked in the > > poller, this will add a significant latency. This problem happens when > > maxconn > > values are very low. > > Regarding this last section, we are a bit worried about the usability > of the new `nbthread` feature in 1.8. It raised a few question on our > side: > - Is it considered as an experimental feature?
Threading was clearly released with an experimental status, just like H2, because we knew we'd be facing some post-release issues in these two areas that are hard to get 100% right at once. However I consider that the situation has got much better, and to confirm this, both of these are now enabled by default in HapTech's products. With this said, I expect that over time we'll continue to see a few bugs, but not more than what we're seeing in various areas. For example, we didn't get a single issue on haproxy.org since it was updated to the 1.8.1 or so, 3 months ago. So this is getting quite good. > - Should we expect potential latencies side effects in some situations > as described in your commit message? (and so avoid to use it for low > latency usage) I ran a stress test on this patch, with a single server running with "maxconn 1", with a frontend bound to two threads. I measure exactly 30000 conn/s with a single thread (keep in mind that there's a single connection at once), and 28500 with two threads. Thus the sync point takes on average an extra 1.75 microsecond, compared to the 35 microseconds it takes on average to finish processing the request (connect, server processing, response, close). Also if you're running with nbproc > 1 instead, the maxconn setting is not really respected since it becomes per-process. When you run with 8 processes it doesn't mean much anymore, or you need to have small maxconn settings, implying that sometimes a process might queue some requests while there are available slots in other processes. Thus I'd argue that the threads here significantly improve the situation by allowing all connection slots to be used by all CPUs, which is a real improvement which should theorically show you lower latencies. Note that if this is of interest to you, it's trivial to make haproxy run in busy polling mode, and in this case the performance increases to 30900 conn/s, at the expense of eating all your CPU (which possibly you don't care about if the latency is your worst ennemy). We can possibly even improve this to ensure that it's done only when there are existing sessions on a given thread. Let me know if this is something that could be of interest to you, as I think we could make this configurable and bypass the sync point in this case. > - I saw some commits for 1.9 release which probably improves the > situation about lockess threading > http://git.haproxy.org/?p=haproxy.git;a=commit;h=cf975d46bca2515056a4f55e55fedbbc7b4eda59 > http://git.haproxy.org/?p=haproxy.git;a=commit;h=4815c8cbfe7817939bcac7adc18fd9f86993e4fc > But I guess they will not be backported for 1.8, right? No they're definitely not for 1.8 and still really touchy. We're progressively attacking locks wherever we can. Some further patches will refine the scheduler to make it more parallel, and even the code above will continue to change, see it as a first step in the right direction. We noticed a nice performance boost on the last one with many cores (24 threads, something like +40% on connection rate), but we'll probably see even better once the rest is addressed. Cheers, Willy