On Fri, Sep 25, 2020 at 03:57:58PM +0200, Lukasz Tasz wrote:
> first is opposite to leastconn - first keep load at the beginning,
> leastconn same everywhere.
Indeed.
> but why source is limiting balancing only to one machine?
Because in almost every case people want to use it to maintain affinity.
For example you don't want the control and data connection of an FTP
client to go to different servers.
> would it be possible to remove this limitation?
It's in part done by hash-balance-factor which forces the algorithm
to pick a different server when the one initially choosen has too high
a load.
However removing completely a hashed server from the pool every time it
reaches its limits would have serious consequences in terms of CPU usage.
At high loads a server should constantly run at its limit, which means
that it's constantly added and removed to/from the pool. This can happen
hundreds of thousands of times per second. Adding or removing a hashed
server into the pool means:
- recomputing the whole farm's hash for static algorithms
- inserting or removing a large number of instances of this server
from the consistent hash tree when using consistent hashing
Both are very expensive and while they're actually acceptable for the
rare cases where a server goes up or down, they're really not for when
a server reaches a limit or drops below.
Also, at a low dose, queuing is good and can even be faster than
spreading the load on more servers because it makes a more efficient
use of network packets and TCP connections by merging requests with
ACKs, and can deliver a request to a server having the code to handle
it hot in its cache with a CPU already running at top speed. But that's
true for small queue sizes of course.
Before we had hash-balance-factor, some people were doing something
related to what you describe by using two backends: one with the hash
LB algorithm, and another one with RR, random or anything. And they
would choose what backend to use depending on the average queue size
in the first backend, using the avg_queue(backend) sample fetch.
Now with hash-balance-factor, I really think you could reach your
goal, because usually if a server has some requests in queue, either
all of them do, and you don't care what server you pick, or one has
not, and hash-balance-factor will spot it and use it instead. Just
run it with a very low value, e.g. 101, and that should be OK.
If that's not sufficient, I think we could imagine adding a "no-queue"
option to the balance keyword to indicate that when the selected server
is full, instead the request should be queued in the backend. It would
still require significant changes to make sure that such a request can
*first* be attempted on another server before being queued but I think
it might be possible. An alternative would be to have a modifier on the
consistent hash algorithms to try to find a different server when the
selected one has some queue. The problem is, we need to limit the
attempts because we don't want to loop forever if all have some queue.
Maybe that could actually be a variant of the hash-balance-factor to
ask it to systematically ignore servers with queue.
regards,
Willy