On 2018/5/2 16:29, Patrick Hemmer wrote: > > > On 2018/5/2 13:22, Willy Tarreau wrote: >> On Wed, May 02, 2018 at 12:44:06PM -0400, Patrick Hemmer wrote: >>> Can you elaborate on what you're thinking of for a time-based queue? >>> >>> What I'm imagining you mean is that you would write a rule to set the >>> max queue time, and haproxy would insert it into the queue sorting on >>> TIME_NOW() + MAX_QUEUE_TIME. The main difference I see to this approach >>> vs scoring, is that you ensure that an item doesn't sit on the queue >>> forever (or whatever `timeout queue` is set to) if higher priority stuff >>> keeps getting inserted before it. >> In part, but mostly that under contention you can still maintain a >> good quality of service. I remember having had a discussion a very >> long time ago with a gaming site operator explaining that he wanted >> to send profile management requests to a dedicated slow server so >> that people filling in their profile do not disturb the ones playing. >> For me it's a good example : "how long do you think these people are >> willing to wait for the server to respond to each click to update >> their profile?". Let's say 2 seconds is OK, you just add a 2s offset >> to the position in the queue for such requests. If in the mean time >> other higher priority requests come in, the delayed one may face the >> 2 seconds delay. But it will not face more. And obviously if no other >> request is pending before it in the queue it will be served earlier. >> >> A similar example is for certain shopping sites which consider that >> once you have something in your shopping cart, you're much more likely >> to finish the process and to pay because you've found the item you >> were looking for, so you're more willing to tolerate a slightly higher >> response time, and as such provide a better response time to newcomers. >> But while this small delay can easily be expressed in milliseconds >> (probably 50/100), it's almost impossible to define it using positions. >> What if 50 new visitors suddenly come in ? You don't want the guy with >> his item to suddenly experience a timeout. The time-based queue however >> grants you a control over the service degradation you're accepting to >> inflict to your users. > There's a key difference in our designs. You're trying to classify > traffic based on destination. I'm trying to classify traffic based on > source. > In my design, the maxconn is set to a level such that the server is > healthy and every request can be processed fast. With this, and a > score based queue, and under attack, the good requests will be drained > fast, leaving only the bad requests. Thus no normal user should be > getting a timeout, no matter what resource they're going to. > Your design works when the backend system can't keep up with good > legitimate traffic, and you need to prioritize one good request over > another good request. My design is when the backend system can't keep > up with bad traffic, and so we want to process the good traffic first. > >> Another interesting example is when you want ot provide strong service >> guarantees to some top-level visitors while running under intense load. >> By using only a position, it's easy to say "I want the Gold class to >> advance by 1000 positions". But if the site is highly loaded and you >> have 10k pending requests, these 1000 positions remain irrelevant, >> because the requests sits there, waiting for 90% of the pollution to >> be flushed. > I think you're misunderstanding my design, as scoring wouldn't work > like this at all. If you give the gold class a score of 1000 (where > higher number means higher priority), then the only thing that would > get processed before gold class would be another class that has a > score > 1000. If nothing does, and a gold request comes in, it gets > processed first, no matter how big the queue is. > Think of scoring as instead having multiple queues. Each queue is only > processed if the queue before it is empty. >> If you say "I want these requests to advance by 10 seconds" >> then no matter how many other requests you have in the queue, it will >> really advance by 10 seconds and effectively shrink the response time >> by 10 seconds. > It may shrink the response time by 10 seconds, but if the normal > response time is 60 seconds, that's still 50 seconds of waiting. If > your users are only willing to put up with 30 seconds of wait, this > means you end up failing *everyone*. > >> Overall for user experience it's important to think in time and not >> positions. The rationale behind this is simple : the user will never >> accept as a justification for varying quality of service the number of >> other visitors. And positions just depend on this, it's an average number >> of people you're allowing to pass over. If I'm granted a pass ticket >> allowing me to advance 10 places in the cinema queue and I find a >> crowded street, I will go back home. If I'm told I won't wait more >> than 5 minutes, I'll come regardless of the number of waiters. > But a time based queue wouldn't work like this. You can't guarantee > service within 5 minutes. What happens if a ton of people arrive at > the same time who were told "you will never wait more than 5 minutes", > and the cinema can't handle them all? Or if you're the only person in > the line and nobody leaves the cinema for 6 minutes? Or there are > already 100 people in line who were told "you will never wait more > than 1 hour", and that hour is almost up? > >>> I don't think this is necessarily a good thing. >> For having considered the two options several times in field over the >> last decade when sites started to crawl, I became very well convinced ;-) >> >>> If you're under a DOS >>> attack, the goal is to get the good requests processed before any >>> possible malicious requests. With a time based queue, those malicious >>> requests will still get processed and starve out the good requests. >> Precisely not that much because the good ones will pass before, and >> the malicious ones will then be subject to the queue timeout if there >> are too many. >> >>> For >>> example lets say you're under attack, a bad request comes in with >>> max_queue_time=1000ms, and then after 999ms elapse, a good request comes >>> in with max_queue_time=10ms. You have a good request, and a bad request >>> on the queue, but HAProxy is going to process the bad request first >>> because its timer is expiring first. >> Absolutely but you fell into the trap of not considering that the queue >> is supposed to be full, so you're in fact comparing only two requests, >> while in practice you have many of them and there the effect is much >> more visible. > I don't think so. Server connections are full, not the queue. The > point I was trying to make is that you have a request you are pretty > confident is bad, and you waste resources processing it while the > request you think is good gets to wait. No matter how big the queue > is, as long as a bad request sits in front of a good request in the > queue (regardless of whether good request chronologically came after > bad, or bad request chronologically came after good), this applies. >>> Essentially if haproxy is receiving >>> X good requests per second, and Y bad requests per second, it's still >>> going to forward X good per second, and Y bad per second, to the backend >>> server. The only difference is that they're time shifted. >> Except once subject to the queue timeout. It's a very important aspect we >> must not dismiss. >> >>> The other thing I could think you mean by time-based is to insert into >>> the queue sorting on MAX_QUEUE_TIME, just like a score-based queue, but >>> you would still record TIME_NOW() + MAX_QUEUE_TIME, and would reject >>> requests that don't get processed by their deadline. Essentially a >>> per-request version of the `timeout queue` setting. But I don't see any >>> real value in this. >>> >>> Or do you mean something else? >> What I mean is that the queue is indexed on the computed service time >> which is (current_time + offset), where offset can be either positive or >> negative, null by default. The queue is sorted by service time, just like >> any timer in fact. Then the queue collection simply picks the next event >> in the queue. >> >> Please also note that right now the queue already considers the time to >> maintain fairness between requests targetting a specific server and those >> who just want any server. This is what ensures no request starves in either >> the server's queue or the backend's queue. With a time-sorted queue, instead >> of picking the tasks according to their time of arrival, we'd simply pick >> them based on the service time, and we could maintain an excellent fairness >> between servers and backends. >> >> Regarding the DOS situations, there are certain aspects which slightly >> differ from the points around the quality of service. DOS fighting involves >> sacrifice, and basic quality of service hardly provides this by default. In >> our case, sacrifice happens on a queue size or time. And the decision should >> depend on the request's importance, which is often very different from the >> desired response time. The well known example is the "Pay" button on many >> sites : you click on it, and if it doesn't respond fast, you will not press >> Escape. You're keeping fingers crossed hoping it will not return an error, >> even if it takes 30 seconds. And moreover, you're happy once it responds! >> Here that's what makes the difference with the response time : in fact you'd >> like to say that certain requests are highly sensitive but not urgent, and >> that you'd like to be able to increase their timeout and ultimately get >> served. >> >> To fight DOS it's the same. Commonly, many sites consider that requests >> holding a valid cookie correspond to authenticated users and score much >> better in terms of trustability. Thus by having adjustable timeouts, it >> would make it very easy to adjust the queue timeout for a request based >> on the presence of a cookie or not. >> >> Now when you think about it, a reasonable but simple strategy for some of >> the examples above could be summarized to this : >> >> timeout queue 10s >> # pay button not urgent but needs to work >> http-request set-timeout queue 60s if PAY_BUTTON >> http-request set-priority -10s if PAY_BUTTON >> >> # unauthenticated less urgent, can be a DOS >> http-request set-priority -1s if NO_COOKIE || FAVICON >> >> # some elements that need to be deliveered quickly >> http-request set-priority 1s if INDEX_HTML || CSS || JS || ICONS >> >> # auto-completion needs to be fast but no problem if it doesn't work >> http-request set-timeout queue 1s if AUTO_COMPLETION_REQUEST >> http-request set-priority 10s if AUTO_COMPLETION_REQUEST >> >> # inconsistent user-agents are suspicious, delay them just in case >> http-request set-priority -20s if SUSPICIOUS_USER_AGENT >> >> # too fast users need to slow down a little bit >> http-request set-priority -20s if { sc0_rate gt 10 } || { sc0_err_cnt gt >> 10 } >> >> With a proper API we can even imagine having access to these request >> properties from Lua to implement far more advanced policies. >> >> Regards, >> Willy > Here's an example scenario that a time based queue cannot handle: > Lets say you have a server that can only process 10 requests per > second. Your normal traffic is 2 requests per second (500ms between > req). Then you start getting an attack of 50 requests per second from > SUSPICIOUS_USER_AGENT (20ms between req). The queue will end up with > 520 requests (50rps*10s + 2rps*10s # the 10s is the timeout). Then a > INDEX_HTML request comes in. With a `set-priority 1s`, you advance it > up 1s in the queue, which puts it ahead of 52 requests, but there's > still 468 requests in front of it, which means it'll be 46.8s before > it is processed, and with `timeout queue 10s`, it times out. The only > request which will work is the PAY_BUTTON, and that's because of the > increased timeout, as all the SUSPICIOUS_USER_AGENT requests in front > of it will time out first, and at prio -10s, it'll take 20s to complete. > > Lets go through this same scenario with a scoring system. Lets assume > that in `set-priority Xs`, the "Xs" becomes "X", where positive means > higher priority. > The queue will have 500 requests pending (this is different from the > 520 above. see the end for explanation). Then a INDEX_HTML request > comes in. With `set-priority 1`, because all the SUSPICIOUS_USER_AGENT > requests have priority -20, the INDEX_HTML goes to the front of the > queue and gets processed as soon as a connection is freed. Lets also > assume a NO_COOKIE comes in, with priority -1, it'll go right after > the INDEX_HTML request. And then because SUSPICIOUS_USER_AGENT is > constantly at the end of the queue, all the good requests are > processed in near-normal time. The only wait is for a connection slot > to open up. And if the normal traffic rate is 2 requests per second > and server can handle 10 per second, there should never be more than 1 > good request in the queue (except if they come in at the same time). > This is also why at the beginning, the queue only has 500 requests > pending (the only thing in the queue is SUSPICIOUS_USER_AGENT, so > 50rps*10s=500). > > -Patrick > > After re-reading this, and thinking through your design again, I think my understanding of your time-based queuing proposal, and thus my example for it, was wrong. The earliest SUSPICIOUS_USER_AGENT request would come in at T=0s, and so it would go in the queue with key=20 (key = T - priority). Then when INDEX_HTML comes in at T=10s (the oldest SUSPICIOUS_USER_AGENT is timing out), it goes into the queue with key=9, which sorts before key=20.
This is better, but still problematic as lets say you had a rule that did `set-priority -15s`. Then at T=10s, it would get queued at key=25s. The admin has judged it to be more important than SUSPICIOUS_USER_AGENT (prio -15s vs. prio -20s), but the server is still going to waste time processing all those SUSPICIOUS_USER_AGENT requests. While this is not what I would desire, I can see there might be a use case for it. So how about something that would support both: # insert into queue with time-based key http-request set-priority date(20) if SUSPICIOUS_USER_AGENT and # insert into queue with a relative key http-request set-priority int(20) if SUSPICIOUS_USER_AGENT The queue is processed in ascending order, so this should handle both. I would also propose a sample fetch for retrieving, as well as a LUA method for retrieving & setting, so that you can perform relative adjustments to it. E.G: http-request set-priority priority,add(5) if FOO -Patrick