On 16.08.2018 20:15, Willy Tarreau wrote:
> Quite frankly, at the moment (in 2018) I see little value in using that many
> threads or even processes.

You're right; I didn't mention how many processes we're actually running
on these and how things are actually managed. Until reaching 20Gbps, we
were fine running both haproxy (nbproc 4) and IRQs on the same first
cores on the same numa node (according to the NICs' local_cpulist 0-21),
however under higher load idle_pct went pretty low since all the cores
got busy handling network interrupts (actually 16 of them due to RX
queues on the NIC, but that's another story).
So actually the idea wasn't to have haproxy run on all of the higher
cores (44-65), but use affinity to move it to the higher range of cores
on the same numa node to take load off of the IRQ heavy ones.

> Now it should be technically possible to combine nbproc and nbthread.
> The code was designed for this though it purposely doesn't offer you the
> greatest flexibility regarding the possibilities to configure process+thread
> affinity. But using this it should theorically allow you to scale up to 4096
> threads (which is totally ridiculous).

Haven't tested nbthread yet, only lurking on this mailing list for the
past months to see how it goes - but that's definitely high on my list
(even though not pressing since nbproc works just fine).

> Just to give you some reference numbers, in the past I managed to reach
> 60 Gbps of forwarded traffic using a 4-core system, and a bit more recently
> I reached 520k connections/s using only 8 processes. I'd claim that anyone
> running above such numbers should seriously think about spreading the load
> over multiple boxes because a single failure starts to cost a lot.

We're currently spreading traffic across multiple boxes for load sharing
and failover, and thanks to ECMP/BGP that works very nicely.

> Just my two cents

Thanks for those!
Regards,
J.

Reply via email to