Re: Priority based queuing

Patrick Hemmer Wed, 02 May 2018 20:22:09 -0700


On 2018/5/2 16:29, Patrick Hemmer wrote:
>
>
> On 2018/5/2 13:22, Willy Tarreau wrote:
>> On Wed, May 02, 2018 at 12:44:06PM -0400, Patrick Hemmer wrote:
>>> Can you elaborate on what you're thinking of for a time-based queue?
>>>
>>> What I'm imagining you mean is that you would write a rule to set the
>>> max queue time, and haproxy would insert it into the queue sorting on
>>> TIME_NOW() + MAX_QUEUE_TIME. The main difference I see to this approach
>>> vs scoring, is that you ensure that an item doesn't sit on the queue
>>> forever (or whatever `timeout queue` is set to) if higher priority stuff
>>> keeps getting inserted before it.
>> In part, but mostly that under contention you can still maintain a
>> good quality of service. I remember having had a discussion a very
>> long time ago with a gaming site operator explaining that he wanted
>> to send profile management requests to a dedicated slow server so
>> that people filling in their profile do not disturb the ones playing.
>> For me it's a good example : "how long do you think these people are
>> willing to wait for the server to respond to each click to update
>> their profile?". Let's say 2 seconds is OK, you just add a 2s offset
>> to the position in the queue for such requests. If in the mean time
>> other higher priority requests come in, the delayed one may face the
>> 2 seconds delay. But it will not face more. And obviously if no other
>> request is pending before it in the queue it will be served earlier.
>>
>> A similar example is for certain shopping sites which consider that
>> once you have something in your shopping cart, you're much more likely
>> to finish the process and to pay because you've found the item you
>> were looking for, so you're more willing to tolerate a slightly higher
>> response time, and as such provide a better response time to newcomers.
>> But while this small delay can easily be expressed in milliseconds
>> (probably 50/100), it's almost impossible to define it using positions.
>> What if 50 new visitors suddenly come in ? You don't want the guy with
>> his item to suddenly experience a timeout. The time-based queue however
>> grants you a control over the service degradation you're accepting to
>> inflict to your users.
> There's a key difference in our designs. You're trying to classify
> traffic based on destination. I'm trying to classify traffic based on
> source.
> In my design, the maxconn is set to a level such that the server is
> healthy and every request can be processed fast. With this, and a
> score based queue, and under attack, the good requests will be drained
> fast, leaving only the bad requests. Thus no normal user should be
> getting a timeout, no matter what resource they're going to.
> Your design works when the backend system can't keep up with good
> legitimate traffic, and you need to prioritize one good request over
> another good request. My design is when the backend system can't keep
> up with bad traffic, and so we want to process the good traffic first.
>
>> Another interesting example is when you want ot provide strong service
>> guarantees to some top-level visitors while running under intense load.
>> By using only a position, it's easy to say "I want the Gold class to
>> advance by 1000 positions". But if the site is highly loaded and you
>> have 10k pending requests, these 1000 positions remain irrelevant,
>> because the requests sits there, waiting for 90% of the pollution to
>> be flushed. 
> I think you're misunderstanding my design, as scoring wouldn't work
> like this at all. If you give the gold class a score of 1000 (where
> higher number means higher priority), then the only thing that would
> get processed before gold class would be another class that has a
> score > 1000. If nothing does, and a gold request comes in, it gets
> processed first, no matter how big the queue is.
> Think of scoring as instead having multiple queues. Each queue is only
> processed if the queue before it is empty.
>> If you say "I want these requests to advance by 10 seconds"
>> then no matter how many other requests you have in the queue, it will
>> really advance by 10 seconds and effectively shrink the response time
>> by 10 seconds.
> It may shrink the response time by 10 seconds, but if the normal
> response time is 60 seconds, that's still 50 seconds of waiting. If
> your users are only willing to put up with 30 seconds of wait, this
> means you end up failing *everyone*.
>
>> Overall for user experience it's important to think in time and not
>> positions. The rationale behind this is simple : the user will never
>> accept as a justification for varying quality of service the number of
>> other visitors. And positions just depend on this, it's an average number
>> of people you're allowing to pass over. If I'm granted a pass ticket
>> allowing me to advance 10 places in the cinema queue and I find a
>> crowded street, I will go back home. If I'm told I won't wait more
>> than 5 minutes, I'll come regardless of the number of waiters.
> But a time based queue wouldn't work like this. You can't guarantee
> service within 5 minutes. What happens if a ton of people arrive at
> the same time who were told "you will never wait more than 5 minutes",
> and the cinema can't handle them all? Or if you're the only person in
> the line and nobody leaves the cinema for 6 minutes? Or there are
> already 100 people in line who were told "you will never wait more
> than 1 hour", and that hour is almost up?
>
>>> I don't think this is necessarily a good thing.
>> For having considered the two options several times in field over the
>> last decade when sites started to crawl, I became very well convinced ;-)
>>
>>> If you're under a DOS
>>> attack, the goal is to get the good requests processed before any
>>> possible malicious requests. With a time based queue, those malicious
>>> requests will still get processed and starve out the good requests.
>> Precisely not that much because the good ones will pass before, and
>> the malicious ones will then be subject to the queue timeout if there
>> are too many.
>>
>>> For
>>> example lets say you're under attack, a bad request comes in with
>>> max_queue_time=1000ms, and then after 999ms elapse, a good request comes
>>> in with max_queue_time=10ms. You have a good request, and a bad request
>>> on the queue, but HAProxy is going to process the bad request first
>>> because its timer is expiring first.
>> Absolutely but you fell into the trap of not considering that the queue
>> is supposed to be full, so you're in fact comparing only two requests,
>> while in practice you have many of them and there the effect is much
>> more visible.
> I don't think so. Server connections are full, not the queue. The
> point I was trying to make is that you have a request you are pretty
> confident is bad, and you waste resources processing it while the
> request you think is good gets to wait. No matter how big the queue
> is, as long as a bad request sits in front of a good request in the
> queue (regardless of whether good request chronologically came after
> bad, or bad request chronologically came after good), this applies.
>>> Essentially if haproxy is receiving
>>> X good requests per second, and Y bad requests per second, it's still
>>> going to forward X good per second, and Y bad per second, to the backend
>>> server. The only difference is that they're time shifted.
>> Except once subject to the queue timeout. It's a very important aspect we
>> must not dismiss.
>>
>>> The other thing I could think you mean by time-based is to insert into
>>> the queue sorting on MAX_QUEUE_TIME, just like a score-based queue, but
>>> you would still record TIME_NOW() + MAX_QUEUE_TIME, and would reject
>>> requests that don't get processed by their deadline. Essentially a
>>> per-request version of the `timeout queue` setting. But I don't see any
>>> real value in this.
>>>
>>> Or do you mean something else?
>> What I mean is that the queue is indexed on the computed service time
>> which is (current_time + offset), where offset can be either positive or
>> negative, null by default. The queue is sorted by service time, just like
>> any timer in fact. Then the queue collection simply picks the next event
>> in the queue.
>>
>> Please also note that right now the queue already considers the time to
>> maintain fairness between requests targetting a specific server and those
>> who just want any server. This is what ensures no request starves in either
>> the server's queue or the backend's queue. With a time-sorted queue, instead
>> of picking the tasks according to their time of arrival, we'd simply pick
>> them based on the service time, and we could maintain an excellent fairness
>> between servers and backends.
>>
>> Regarding the DOS situations, there are certain aspects which slightly
>> differ from the points around the quality of service. DOS fighting involves
>> sacrifice, and basic quality of service hardly provides this by default. In
>> our case, sacrifice happens on a queue size or time. And the decision should
>> depend on the request's importance, which is often very different from the
>> desired response time. The well known example is the "Pay" button on many
>> sites : you click on it, and if it doesn't respond fast, you will not press
>> Escape. You're keeping fingers crossed hoping it will not return an error,
>> even if it takes 30 seconds. And moreover, you're happy once it responds!
>> Here that's what makes the difference with the response time : in fact you'd
>> like to say that certain requests are highly sensitive but not urgent, and
>> that you'd like to be able to increase their timeout and ultimately get
>> served.
>>
>> To fight DOS it's the same. Commonly, many sites consider that requests
>> holding a valid cookie correspond to authenticated users and score much
>> better in terms of trustability. Thus by having adjustable timeouts, it
>> would make it very easy to adjust the queue timeout for a request based
>> on the presence of a cookie or not.
>>
>> Now when you think about it, a reasonable but simple strategy for some of
>> the examples above could be summarized to this :
>>
>>    timeout queue 10s
>>    # pay button not urgent but needs to work
>>    http-request set-timeout queue 60s if PAY_BUTTON
>>    http-request set-priority -10s if PAY_BUTTON
>>
>>    # unauthenticated less urgent, can be a DOS
>>    http-request set-priority -1s  if NO_COOKIE || FAVICON
>>
>>    # some elements that need to be deliveered quickly
>>    http-request set-priority  1s  if INDEX_HTML || CSS || JS || ICONS
>>
>>    # auto-completion needs to be fast but no problem if it doesn't work
>>    http-request set-timeout queue 1s if AUTO_COMPLETION_REQUEST
>>    http-request set-priority 10s  if AUTO_COMPLETION_REQUEST
>>
>>    # inconsistent user-agents are suspicious, delay them just in case
>>    http-request set-priority -20s if SUSPICIOUS_USER_AGENT
>>
>>    # too fast users need to slow down a little bit
>>    http-request set-priority -20s if { sc0_rate gt 10 } || { sc0_err_cnt gt 
>> 10 }
>>
>> With a proper API we can even imagine having access to these request
>> properties from Lua to implement far more advanced policies.
>>
>> Regards,
>> Willy
> Here's an example scenario that a time based queue cannot handle:
> Lets say you have a server that can only process 10 requests per
> second. Your normal traffic is 2 requests per second (500ms between
> req). Then you start getting an attack of 50 requests per second from
> SUSPICIOUS_USER_AGENT (20ms between req). The queue will end up with
> 520 requests (50rps*10s + 2rps*10s # the 10s is the timeout). Then a
> INDEX_HTML request comes in. With a `set-priority 1s`, you advance it
> up 1s in the queue, which puts it ahead of 52 requests, but there's
> still 468 requests in front of it, which means it'll be 46.8s before
> it is processed, and with `timeout queue 10s`, it times out. The only
> request which will work is the PAY_BUTTON, and that's because of the
> increased timeout, as all the SUSPICIOUS_USER_AGENT requests in front
> of it will time out first, and at prio -10s, it'll take 20s to complete.
>
> Lets go through this same scenario with a scoring system. Lets assume
> that in `set-priority Xs`, the "Xs" becomes "X", where positive means
> higher priority.
> The queue will have 500 requests pending (this is different from the
> 520 above. see the end for explanation). Then a INDEX_HTML request
> comes in. With `set-priority 1`, because all the SUSPICIOUS_USER_AGENT
> requests have priority -20, the INDEX_HTML goes to the front of the
> queue and gets processed as soon as a connection is freed. Lets also
> assume a NO_COOKIE comes in, with priority -1, it'll go right after
> the INDEX_HTML request. And then because SUSPICIOUS_USER_AGENT is
> constantly at the end of the queue, all the good requests are
> processed in near-normal time. The only wait is for a connection slot
> to open up. And if the normal traffic rate is 2 requests per second
> and server can handle 10 per second, there should never be more than 1
> good request in the queue (except if they come in at the same time).
> This is also why at the beginning, the queue only has 500 requests
> pending (the only thing in the queue is SUSPICIOUS_USER_AGENT, so
> 50rps*10s=500).
>
> -Patrick
>
>
After re-reading this, and thinking through your design again, I think
my understanding of your time-based queuing proposal, and thus my
example for it, was wrong.
The earliest SUSPICIOUS_USER_AGENT request would come in at T=0s, and so
it would go in the queue with key=20 (key = T - priority). Then when
INDEX_HTML comes in at T=10s (the oldest SUSPICIOUS_USER_AGENT is timing
out), it goes into the queue with key=9, which sorts before key=20.


This is better, but still problematic as lets say you had a rule that
did `set-priority -15s`. Then at T=10s, it would get queued at key=25s.
The admin has judged it to be more important than SUSPICIOUS_USER_AGENT
(prio -15s vs. prio -20s), but the server is still going to waste time
processing all those SUSPICIOUS_USER_AGENT requests.

While this is not what I would desire, I can see there might be a use
case for it. So how about something that would support both:

    # insert into queue with time-based key
    http-request set-priority date(20) if SUSPICIOUS_USER_AGENT

and

    # insert into queue with a relative key
    http-request set-priority int(20) if SUSPICIOUS_USER_AGENT

The queue is processed in ascending order, so this should handle both.
I would also propose a sample fetch for retrieving, as well as a LUA
method for retrieving & setting, so that you can perform relative
adjustments to it. E.G:

    http-request set-priority priority,add(5) if FOO
   

-Patrick

Re: Priority based queuing

Reply via email to