On Fri, May 04, 2018 at 06:49:00PM -0400, Patrick Hemmer wrote: > I'm not quite following the need for multiple queues. Why wouldn't you > just have one sorted queue, where if multiple pending requests have the > same priority, then they're FIFO.
That's what the time-ordered queue does. I proposed this so that you can ensure that lower priorities are always processed first until there is no more of the same level. > I had sent another email which proposed an implementation that would > satisfy both designs > (https://www.mail-archive.com/haproxy@formilux.org/msg29876.html). > Unfortunately I was composing it when you sent your most recent reply, > so I didn't see it before I sent. Thanks, I missed this one indeed. > I went ahead and implemented the functionality I had talked about in > that other email, and have attached a WIP patch. I've done some testing > on it, and it seems to work fine. The syntax is exactly as proposed: > > # time-based priority > http-request set-priority date(20) if LOW_PRIORITY > http-request set-priority date(-20) if HIGH_PRIORITY > > # score-based priority > http-request set-priority int(20) if LOW_PRIORITY > http-request set-priority int(-20) if HIGH_PRIORITY > > Some notes on the implementation: > > I used a `long` for tracking the priority. Obviously this has > limitations when used with date(), as the timestamps are fairly large. I > can change this to `long long`. Ah now I understand how you wanted to implement it. I have a big concern here with wrapping time however, which is what I explained in an earlier mail where I said that if needed we could mix date and priority in the same integer but that would still require two lookups. Our date wraps every 49.7 days (32-bit millisecond resolution) so depending on the phases of the moon some requests will be processed before the highest priority ones or after, and you'll see some priority inversion 7 times a year. > The queue was implemented using the existing pendconns code. The only > thing that was changed was to insert in sorted order. Obviously this is > just a quick and dirty implementation. There are lots of priority queue > implementations to choose from, and I don't know which one is best > suited to this kind of task. We could keep the existing sorted > linked-list, and just do some optimizations for insertion. Or we could > switch to some sort of tree. The linked-list has the advantage in that I > suspect that the vast majority of additions will append to the end, and > pops are always off the front, so well suited to a linked list. A tree > might become very unbalanced very fast, and require lots of maintenance. No, trees are quite cheap, we use them absolutely everywhere now. Just take a look at wake_expired_tasks() to get an idea. In fact every single time we've been relying on lists doing linear lookups, we've got several bug reports about haproxy eating all the CPU because some people were using it in a way that wasn't initially considered as reasonable but which made a lot of sense to them. In the case of the list above, the corner case would be server with "maxconn 5000" and half of the requests having a high priority and the other half having a low one, resulting in an average of 2500 visits in the optimal case where you can insert from both sides (which was what we used to do a very long time ago in the scheduler). > Using the date() fetch obviously doesn't provide good precision. Hmmm good point here it's seconds so it will not wrap for the next 20 years but will provide a useless precision. > We > could add a `datems()`, `datens()`, or `ticks()` (ms since start) sample > fetch. But obviously we would need to use `long long` for that, Though > ticks could use a `long`, but would wrap at around 16 months. No, 49.7 days ;-) The other problem I'm having here is that it artificially mixes two types of totally unrelated data and relies on the lack of intersection between their domains (when time doesn't wrap). Indeed, here you're considering that values below a certain number cannot be dates so they definitely are fixed priorities. But that also means that people who want to use ranges will randomly collide with dates and will observe a changing behaviour along the year. In this case you'd be better of with the principle consisting in using a larger integer with the priority at the top and the date at the bottom. But that's where I said that it would require two lookups to deal with wrapping date so I'm not sure it's any better than using multiple queues. Well, at least it's less maintenance burden. The detailed principle is the following : uint64_t priority : 63 32 31 0 [ hard priority | date ] next = lookup_first(queue); if (!next) return; => next is the lowest value of both priority and date Then you perform a second lookup with the oldest past date for the same priority level : key = (next->key & 0xffffffff00000000ULL) | (now_ms - TIMER_LOOK_BACK); next2 = lookup_ge(queue, key); if (next2 && !((next2->value ^ next->value) >> 32)) next=next2 Then you dequeue next which is always the next one. That's where I thought that using two distinct levels was simpler, but if we factor in the fact that it would limit the number of priority classes and would require more maintenance code, it's probably not worth the hassle to save the 4 extra lines we have above to respect the wrapping date. Also I'm thinking that we can even use 32-bit by putting the frontier between the date and the fixed priority (let's call it class) somewhere else : - 8 bit class + 24 bit offset => 256 classes and +/- 2.3 hours offsets - 12 bit class + 20 bit offset => 4096 classes and +/- 8 minutes offsets - 16 bit class + 16 bit offset => 64k classes and +/- 32 second offsets I'm pretty sure that the two first options are way enough to cover almost everyone's needs. Let's say for a moment that we'd use the second option (12+20 bits), the queue insertion code becomes this : pendconn->node.key = (stream->prio_class << 20) + (now_ms + stream->prio_offset) & 0xFFFFF; if (srv) eb32_insert(&srv->pendconns, &pendconn->node); else eb32_insert(&px->pendconns, &pendconn->node); And the dequeuing code in pendconn_process_next_strm() which takes care both of the backend's and server's queues while still respecting their classes would become (modulo locking, which is not trivial there but which has to be preserved regardless of the solution) : srv_node = eb32_first(&srv->pendconns); prx_node = eb32_first(&prx->pendconns); if (!srv_node) return prx_node; if (!prx_node) return srv_node; if (srv_node->key >> 20 < prx_node->key >> 20) return srv_node; if (srv_node->key >> 20 > prx_node->key >> 20) return prx_node; /* classes are equal, pick oldest expiration first */ key = (srv_node->key & 0xfff00000) + (now_ms - (TIMER_LOOK_BACK >> 12)) & 0xFFFFF; srv_node2 = eb32_lookup_ge(&srv->pendconns, key); if (srv_node2 && !((srv_node2->value ^ srv_node->value) >> 20)) srv_node = srv_node2; prx_node2 = eb32_lookup_ge(&prx->pendconns, key); if (prx_node2 && !((prx_node2->value ^ prx_node->value) >> 20)) prx_node = prx_node2; /* take the first one in time order */ if ((prx_node->key - srv_node->key) & 0xFFFFF < 0x80000) return srv_node; else return prx_node; All of this looks fairly reasonable to me and could possibly remove a bit of the current complexity in the current function which has to dereference the streams to find the queuing dates. Then the documentation becomes pretty simple : - all requests from the lowest numbered priority classes are always processed first - within a same priority class, requests are processed in order of arrival, to which a positive or negative time offset an be applied We could then have this in http-request and tcp-request : http-request set-priority-class int(x) http-request set-priority-offset int(x) BTW you were right to remind me about using a sample expression instead of a static integer in the argument. I've been used to use static values everywhere for years but for such cases, being able to use expressions is much more useful as we could extract the value from a header or cookie for example. Willy