Re: Priority based queuing

Patrick Hemmer Sat, 05 May 2018 10:35:25 -0700


On 2018/5/5 01:29, Willy Tarreau wrote:
> On Fri, May 04, 2018 at 06:49:00PM -0400, Patrick Hemmer wrote:
>> I'm not quite following the need for multiple queues. Why wouldn't you
>> just have one sorted queue, where if multiple pending requests have the
>> same priority, then they're FIFO.
> That's what the time-ordered queue does. I proposed this so that you can
> ensure that lower priorities are always processed first until there is no
> more of the same level.
Ah, I was considering the 2 solutions as an either-or choice. Not that
you'd use both in the same configuration, or at least in the same backend.


>> The queue was implemented using the existing pendconns code. The only
>> thing that was changed was to insert in sorted order. Obviously this is
>> just a quick and dirty implementation. There are lots of priority queue
>> implementations to choose from, and I don't know which one is best
>> suited to this kind of task. We could keep the existing sorted
>> linked-list, and just do some optimizations for insertion. Or we could
>> switch to some sort of tree. The linked-list has the advantage in that I
>> suspect that the vast majority of additions will append to the end, and
>> pops are always off the front, so well suited to a linked list. A tree
>> might become very unbalanced very fast, and require lots of maintenance.
> No, trees are quite cheap, we use them absolutely everywhere now. Just
> take a look at wake_expired_tasks() to get an idea. In fact every single
> time we've been relying on lists doing linear lookups, we've got several
> bug reports about haproxy eating all the CPU because some people were
> using it in a way that wasn't initially considered as reasonable but
> which made a lot of sense to them.
This is why I liked the idea of a single implementation. It gave the
flexibility to the user to do whatever they wanted (time or class
based). Granted I was only thinking of doing one or the other, but the
user could still do both, they would just have to manage dedicating a
range to the date/class. Maybe someone would want to consider date
before class.

> In this case you'd be better of with the principle consisting in using
> a larger integer with the priority at the top and the date at the bottom.
> But that's where I said that it would require two lookups to deal with
> wrapping date so I'm not sure it's any better than using multiple queues.
> Well, at least it's less maintenance burden. The detailed principle is
> the following :
>
>    uint64_t priority :
>
>         63            32 31             0
>         [ hard priority |     date       ]
>
>    next = lookup_first(queue);
>    if (!next)
>       return;
>  
>    => next is the lowest value of both priority and date
>
> Then you perform a second lookup with the oldest past date for the same
> priority level :
>
>    key = (next->key & 0xffffffff00000000ULL) | (now_ms - TIMER_LOOK_BACK);
>    next2 = lookup_ge(queue, key);
>    if (next2 && !((next2->value ^ next->value) >> 32))
>       next=next2
>
> Then you dequeue next which is always the next one.
>
> That's where I thought that using two distinct levels was simpler, but
> if we factor in the fact that it would limit the number of priority
> classes and would require more maintenance code, it's probably not
> worth the hassle to save the 4 extra lines we have above to respect
> the wrapping date.
>
> Also I'm thinking that we can even use 32-bit by putting the frontier
> between the date and the fixed priority (let's call it class) somewhere
> else :
>   - 8 bit class + 24 bit offset  => 256 classes and +/- 2.3 hours offsets
>   - 12 bit class + 20 bit offset => 4096 classes and +/- 8 minutes offsets
>   - 16 bit class + 16 bit offset => 64k classes and +/- 32 second offsets
What's the benefit to using 32-bit over 64-bit, and is that benefit
worth the cost of the added code complexity and processing time? If we
used 64-bit, we could do a 16/48 bit split and have an absolute
timestamp out to year 10889 at millisecond precision.

> I'm pretty sure that the two first options are way enough to cover almost
> everyone's needs.
For me a 12-bit class should work. My intention is to implement
something similar to email spam filtering, where you generate a score
based on the rules that match, and some rules are worth more than
others. 8 bits might be a bit difficult to work within.
I also suspect the 16-bit option (32 second offset) is too small for
date based.

> Let's say for a moment that we'd use the second option
> (12+20 bits), the queue insertion code becomes this :
>
>    pendconn->node.key = (stream->prio_class << 20) +
>                         (now_ms + stream->prio_offset) & 0xFFFFF;
>    if (srv)
>         eb32_insert(&srv->pendconns, &pendconn->node);
>    else
>         eb32_insert(&px->pendconns, &pendconn->node);
>
>
> And the dequeuing code in pendconn_process_next_strm() which takes care
> both of the backend's and server's queues while still respecting their
> classes would become (modulo locking, which is not trivial there but
> which has to be preserved regardless of the solution) :
>
>    srv_node = eb32_first(&srv->pendconns);
>    prx_node = eb32_first(&prx->pendconns);
>    if (!srv_node)
>       return prx_node;
>    if (!prx_node)
>       return srv_node;
>
>    if (srv_node->key >> 20 < prx_node->key >> 20)
>       return srv_node;
>
>    if (srv_node->key >> 20 > prx_node->key >> 20)
>       return prx_node;
>
>    /* classes are equal, pick oldest expiration first */
>    key = (srv_node->key & 0xfff00000) +
>          (now_ms - (TIMER_LOOK_BACK >> 12)) & 0xFFFFF;
>
>    srv_node2 = eb32_lookup_ge(&srv->pendconns, key);
>    if (srv_node2 && !((srv_node2->value ^ srv_node->value) >> 20))
>          srv_node = srv_node2;
>
>    prx_node2 = eb32_lookup_ge(&prx->pendconns, key);
>    if (prx_node2 && !((prx_node2->value ^ prx_node->value) >> 20))
>          prx_node = prx_node2;
>
>    /* take the first one in time order */
>    if ((prx_node->key - srv_node->key) & 0xFFFFF < 0x80000)
>        return srv_node;
>    else
>        return prx_node;
>
> All of this looks fairly reasonable to me and could possibly remove a
> bit of the current complexity in the current function which has to
> dereference the streams to find the queuing dates.
>
> Then the documentation becomes pretty simple :
>    - all requests from the lowest numbered priority classes are always
>      processed first
>    - within a same priority class, requests are processed in order of
>      arrival, to which a positive or negative time offset an be applied
>
> We could then have this in http-request and tcp-request :
>
>     http-request set-priority-class int(x)
>     http-request set-priority-offset int(x)
>
>
> BTW you were right to remind me about using a sample expression instead
> of a static integer in the argument. I've been used to use static values
> everywhere for years but for such cases, being able to use expressions
> is much more useful as we could extract the value from a header or cookie
> for example.
>
> Willy

The rest of this all seems fine. I've got no other thoughts to add.

So the question is, which route do we go?
1. 32-bit int (12/20 split)
2. 64-bit int (32/32 split)
3. 64-bit int (16/48 split) with absolute timestamps
4. 64-bit int with a single `set-priority` action. Let the user use the
a new`ms()` fetch for time based. The user does the bit shifting if they
want to use both (using the `mul()` converter, or we can add a `shift()`).

-Patrick

Re: Priority based queuing

Reply via email to