Re: Priority based queuing

Patrick Hemmer Wed, 02 May 2018 13:30:29 -0700


On 2018/5/2 13:22, Willy Tarreau wrote:
> On Wed, May 02, 2018 at 12:44:06PM -0400, Patrick Hemmer wrote:
>> Can you elaborate on what you're thinking of for a time-based queue?
>>
>> What I'm imagining you mean is that you would write a rule to set the
>> max queue time, and haproxy would insert it into the queue sorting on
>> TIME_NOW() + MAX_QUEUE_TIME. The main difference I see to this approach
>> vs scoring, is that you ensure that an item doesn't sit on the queue
>> forever (or whatever `timeout queue` is set to) if higher priority stuff
>> keeps getting inserted before it.
> In part, but mostly that under contention you can still maintain a
> good quality of service. I remember having had a discussion a very
> long time ago with a gaming site operator explaining that he wanted
> to send profile management requests to a dedicated slow server so
> that people filling in their profile do not disturb the ones playing.
> For me it's a good example : "how long do you think these people are
> willing to wait for the server to respond to each click to update
> their profile?". Let's say 2 seconds is OK, you just add a 2s offset
> to the position in the queue for such requests. If in the mean time
> other higher priority requests come in, the delayed one may face the
> 2 seconds delay. But it will not face more. And obviously if no other
> request is pending before it in the queue it will be served earlier.
>
> A similar example is for certain shopping sites which consider that
> once you have something in your shopping cart, you're much more likely
> to finish the process and to pay because you've found the item you
> were looking for, so you're more willing to tolerate a slightly higher
> response time, and as such provide a better response time to newcomers.
> But while this small delay can easily be expressed in milliseconds
> (probably 50/100), it's almost impossible to define it using positions.
> What if 50 new visitors suddenly come in ? You don't want the guy with
> his item to suddenly experience a timeout. The time-based queue however
> grants you a control over the service degradation you're accepting to
> inflict to your users.
There's a key difference in our designs. You're trying to classify
traffic based on destination. I'm trying to classify traffic based on
source.
In my design, the maxconn is set to a level such that the server is
healthy and every request can be processed fast. With this, and a score
based queue, and under attack, the good requests will be drained fast,
leaving only the bad requests. Thus no normal user should be getting a
timeout, no matter what resource they're going to.
Your design works when the backend system can't keep up with good
legitimate traffic, and you need to prioritize one good request over
another good request. My design is when the backend system can't keep up
with bad traffic, and so we want to process the good traffic first.


>
> Another interesting example is when you want ot provide strong service
> guarantees to some top-level visitors while running under intense load.
> By using only a position, it's easy to say "I want the Gold class to
> advance by 1000 positions". But if the site is highly loaded and you
> have 10k pending requests, these 1000 positions remain irrelevant,
> because the requests sits there, waiting for 90% of the pollution to
> be flushed. 
I think you're misunderstanding my design, as scoring wouldn't work like
this at all. If you give the gold class a score of 1000 (where higher
number means higher priority), then the only thing that would get
processed before gold class would be another class that has a score >
1000. If nothing does, and a gold request comes in, it gets processed
first, no matter how big the queue is.
Think of scoring as instead having multiple queues. Each queue is only
processed if the queue before it is empty.
> If you say "I want these requests to advance by 10 seconds"
> then no matter how many other requests you have in the queue, it will
> really advance by 10 seconds and effectively shrink the response time
> by 10 seconds.
It may shrink the response time by 10 seconds, but if the normal
response time is 60 seconds, that's still 50 seconds of waiting. If your
users are only willing to put up with 30 seconds of wait, this means you
end up failing *everyone*.

>
> Overall for user experience it's important to think in time and not
> positions. The rationale behind this is simple : the user will never
> accept as a justification for varying quality of service the number of
> other visitors. And positions just depend on this, it's an average number
> of people you're allowing to pass over. If I'm granted a pass ticket
> allowing me to advance 10 places in the cinema queue and I find a
> crowded street, I will go back home. If I'm told I won't wait more
> than 5 minutes, I'll come regardless of the number of waiters.
But a time based queue wouldn't work like this. You can't guarantee
service within 5 minutes. What happens if a ton of people arrive at the
same time who were told "you will never wait more than 5 minutes", and
the cinema can't handle them all? Or if you're the only person in the
line and nobody leaves the cinema for 6 minutes? Or there are already
100 people in line who were told "you will never wait more than 1 hour",
and that hour is almost up?

>
>> I don't think this is necessarily a good thing.
> For having considered the two options several times in field over the
> last decade when sites started to crawl, I became very well convinced ;-)
>
>> If you're under a DOS
>> attack, the goal is to get the good requests processed before any
>> possible malicious requests. With a time based queue, those malicious
>> requests will still get processed and starve out the good requests.
> Precisely not that much because the good ones will pass before, and
> the malicious ones will then be subject to the queue timeout if there
> are too many.
>
>> For
>> example lets say you're under attack, a bad request comes in with
>> max_queue_time=1000ms, and then after 999ms elapse, a good request comes
>> in with max_queue_time=10ms. You have a good request, and a bad request
>> on the queue, but HAProxy is going to process the bad request first
>> because its timer is expiring first.
> Absolutely but you fell into the trap of not considering that the queue
> is supposed to be full, so you're in fact comparing only two requests,
> while in practice you have many of them and there the effect is much
> more visible.
I don't think so. Server connections are full, not the queue. The point
I was trying to make is that you have a request you are pretty confident
is bad, and you waste resources processing it while the request you
think is good gets to wait. No matter how big the queue is, as long as a
bad request sits in front of a good request in the queue (regardless of
whether good request chronologically came after bad, or bad request
chronologically came after good), this applies.
>
>> Essentially if haproxy is receiving
>> X good requests per second, and Y bad requests per second, it's still
>> going to forward X good per second, and Y bad per second, to the backend
>> server. The only difference is that they're time shifted.
> Except once subject to the queue timeout. It's a very important aspect we
> must not dismiss.
>
>> The other thing I could think you mean by time-based is to insert into
>> the queue sorting on MAX_QUEUE_TIME, just like a score-based queue, but
>> you would still record TIME_NOW() + MAX_QUEUE_TIME, and would reject
>> requests that don't get processed by their deadline. Essentially a
>> per-request version of the `timeout queue` setting. But I don't see any
>> real value in this.
>>
>> Or do you mean something else?
> What I mean is that the queue is indexed on the computed service time
> which is (current_time + offset), where offset can be either positive or
> negative, null by default. The queue is sorted by service time, just like
> any timer in fact. Then the queue collection simply picks the next event
> in the queue.
>
> Please also note that right now the queue already considers the time to
> maintain fairness between requests targetting a specific server and those
> who just want any server. This is what ensures no request starves in either
> the server's queue or the backend's queue. With a time-sorted queue, instead
> of picking the tasks according to their time of arrival, we'd simply pick
> them based on the service time, and we could maintain an excellent fairness
> between servers and backends.
>
> Regarding the DOS situations, there are certain aspects which slightly
> differ from the points around the quality of service. DOS fighting involves
> sacrifice, and basic quality of service hardly provides this by default. In
> our case, sacrifice happens on a queue size or time. And the decision should
> depend on the request's importance, which is often very different from the
> desired response time. The well known example is the "Pay" button on many
> sites : you click on it, and if it doesn't respond fast, you will not press
> Escape. You're keeping fingers crossed hoping it will not return an error,
> even if it takes 30 seconds. And moreover, you're happy once it responds!
> Here that's what makes the difference with the response time : in fact you'd
> like to say that certain requests are highly sensitive but not urgent, and
> that you'd like to be able to increase their timeout and ultimately get
> served.
>
> To fight DOS it's the same. Commonly, many sites consider that requests
> holding a valid cookie correspond to authenticated users and score much
> better in terms of trustability. Thus by having adjustable timeouts, it
> would make it very easy to adjust the queue timeout for a request based
> on the presence of a cookie or not.
>
> Now when you think about it, a reasonable but simple strategy for some of
> the examples above could be summarized to this :
>
>    timeout queue 10s
>    # pay button not urgent but needs to work
>    http-request set-timeout queue 60s if PAY_BUTTON
>    http-request set-priority -10s if PAY_BUTTON
>
>    # unauthenticated less urgent, can be a DOS
>    http-request set-priority -1s  if NO_COOKIE || FAVICON
>
>    # some elements that need to be deliveered quickly
>    http-request set-priority  1s  if INDEX_HTML || CSS || JS || ICONS
>
>    # auto-completion needs to be fast but no problem if it doesn't work
>    http-request set-timeout queue 1s if AUTO_COMPLETION_REQUEST
>    http-request set-priority 10s  if AUTO_COMPLETION_REQUEST
>
>    # inconsistent user-agents are suspicious, delay them just in case
>    http-request set-priority -20s if SUSPICIOUS_USER_AGENT
>
>    # too fast users need to slow down a little bit
>    http-request set-priority -20s if { sc0_rate gt 10 } || { sc0_err_cnt gt 
> 10 }
>
> With a proper API we can even imagine having access to these request
> properties from Lua to implement far more advanced policies.
>
> Regards,
> Willy
Here's an example scenario that a time based queue cannot handle:
Lets say you have a server that can only process 10 requests per second.
Your normal traffic is 2 requests per second (500ms between req). Then
you start getting an attack of 50 requests per second from
SUSPICIOUS_USER_AGENT (20ms between req). The queue will end up with 520
requests (50rps*10s + 2rps*10s # the 10s is the timeout). Then a
INDEX_HTML request comes in. With a `set-priority 1s`, you advance it up
1s in the queue, which puts it ahead of 52 requests, but there's still
468 requests in front of it, which means it'll be 46.8s before it is
processed, and with `timeout queue 10s`, it times out. The only request
which will work is the PAY_BUTTON, and that's because of the increased
timeout, as all the SUSPICIOUS_USER_AGENT requests in front of it will
time out first, and at prio -10s, it'll take 20s to complete.

Lets go through this same scenario with a scoring system. Lets assume
that in `set-priority Xs`, the "Xs" becomes "X", where positive means
higher priority.
The queue will have 500 requests pending (this is different from the 520
above. see the end for explanation). Then a INDEX_HTML request comes in.
With `set-priority 1`, because all the SUSPICIOUS_USER_AGENT requests
have priority -20, the INDEX_HTML goes to the front of the queue and
gets processed as soon as a connection is freed. Lets also assume a
NO_COOKIE comes in, with priority -1, it'll go right after the
INDEX_HTML request. And then because SUSPICIOUS_USER_AGENT is constantly
at the end of the queue, all the good requests are processed in
near-normal time. The only wait is for a connection slot to open up. And
if the normal traffic rate is 2 requests per second and server can
handle 10 per second, there should never be more than 1 good request in
the queue (except if they come in at the same time). This is also why at
the beginning, the queue only has 500 requests pending (the only thing
in the queue is SUSPICIOUS_USER_AGENT, so 50rps*10s=500).

-Patrick

Re: Priority based queuing

Reply via email to