On Wed, May 30, 2018 at 10:00:24AM +0200, Willy Tarreau wrote: > I noticed a strange effect which is that when injecting under low load with > a higher priority (either offset or class) than another high level traffic, > the response time on the higher priority traffic follows a sawtooth shape, > it progressively raises from 0 to 50-80ms and suddenly drops to zero again.
OK I found what causes this, and that totally makes sense. It's due to the fact that I'm using two independant injectors, one requesting a 10ms page and the other one requesting a 100ms one and able to fill the queue. Each time a slow request is dequeued, it's one less slot available for a quick request so the average service time increases, resulting in a higher average wait time in the queue for the first slot to be empty. As fast requests are slowed down, there are more opportunities to add slow ones, hence to slow down the service, until the point where 100% of the slow requests are being served in parallel, resulting in none of them in the queue, which is filled with the fast ones. As soon as all these slow requests are completed, all the slow ones are served immediately, resulting in a much faster service time for all of them, and progressively the slow ones come again. So this is completely normal and expected in this test. It's just not intuitive. The way to combat this is to add another setting which we currently don't have, which is the maximum load a server can have to be served by a given request otherwise it's forced into the queue. For example, if we say that the slow requests cannot use more than 90% of a server's connexions, there will always be 10% available for the other ones, thus completely eliminating the queue for them. It's a bit trickier to implement because it requires than when we dequeue pendconns, if we find one which doesn't validate the server's load, we try a next one, and this can be expensive, especially since most of the time there will be very few requests allowed to use the server to the max. A speedup would be necessary, involving a two-dimensional tree lookup, or maybe a higher bit field containing the server's available slots (twos complement of the entry above, looked up from maxconn-currconn). That's possibly something to think about in the future but it needs further investigation. Cheers, Willy

