On Wed, May 30, 2018 at 10:00:24AM +0200, Willy Tarreau wrote:
> I noticed a strange effect which is that when injecting under low load with
> a higher priority (either offset or class) than another high level traffic,
> the response time on the higher priority traffic follows a sawtooth shape,
> it progressively raises from 0 to 50-80ms and suddenly drops to zero again.

OK I found what causes this, and that totally makes sense. It's due to the
fact that I'm using two independant injectors, one requesting a 10ms page
and the other one requesting a 100ms one and able to fill the queue. Each
time a slow request is dequeued, it's one less slot available for a quick
request so the average service time increases, resulting in a higher average
wait time in the queue for the first slot to be empty. As fast requests are
slowed down, there are more opportunities to add slow ones, hence to slow
down the service, until the point where 100% of the slow requests are being
served in parallel, resulting in none of them in the queue, which is filled
with the fast ones. As soon as all these slow requests are completed, all
the slow ones are served immediately, resulting in a much faster service
time for all of them, and progressively the slow ones come again.

So this is completely normal and expected in this test. It's just not
intuitive.

The way to combat this is to add another setting which we currently don't
have, which is the maximum load a server can have to be served by a given
request otherwise it's forced into the queue. For example, if we say that
the slow requests cannot use more than 90% of a server's connexions, there
will always be 10% available for the other ones, thus completely eliminating
the queue for them. It's a bit trickier to implement because it requires
than when we dequeue pendconns, if we find one which doesn't validate the
server's load, we try a next one, and this can be expensive, especially
since most of the time there will be very few requests allowed to use the
server to the max. A speedup would be necessary, involving a two-dimensional
tree lookup, or maybe a higher bit field containing the server's available
slots (twos complement of the entry above, looked up from maxconn-currconn).

That's possibly something to think about in the future but it needs further
investigation.

Cheers,
Willy

Reply via email to