Re: Priority based queuing

Willy Tarreau Fri, 04 May 2018 22:30:07 -0700

On Fri, May 04, 2018 at 06:49:00PM -0400, Patrick Hemmer wrote:
> I'm not quite following the need for multiple queues. Why wouldn't you
> just have one sorted queue, where if multiple pending requests have the
> same priority, then they're FIFO.


That's what the time-ordered queue does. I proposed this so that you can
ensure that lower priorities are always processed first until there is no
more of the same level.

> I had sent another email which proposed an implementation that would
> satisfy both designs
> (https://www.mail-archive.com/haproxy@formilux.org/msg29876.html).
> Unfortunately I was composing it when you sent your most recent reply,
> so I didn't see it before I sent.

Thanks, I missed this one indeed.

> I went ahead and implemented the functionality I had talked about in
> that other email, and have attached a WIP patch. I've done some testing
> on it, and it seems to work fine. The syntax is exactly as proposed:
> 
>     # time-based priority
>     http-request set-priority date(20) if LOW_PRIORITY
>     http-request set-priority date(-20) if HIGH_PRIORITY
> 
>     # score-based priority
>     http-request set-priority int(20) if LOW_PRIORITY
>     http-request set-priority int(-20) if HIGH_PRIORITY
> 
> Some notes on the implementation:
> 
> I used a `long` for tracking the priority. Obviously this has
> limitations when used with date(), as the timestamps are fairly large. I
> can change this to `long long`.

Ah now I understand how you wanted to implement it. I have a big concern
here with wrapping time however, which is what I explained in an earlier
mail where I said that if needed we could mix date and priority in the
same integer but that would still require two lookups. Our date wraps
every 49.7 days (32-bit millisecond resolution) so depending on the
phases of the moon some requests will be processed before the highest
priority ones or after, and you'll see some priority inversion 7 times
a year.

> The queue was implemented using the existing pendconns code. The only
> thing that was changed was to insert in sorted order. Obviously this is
> just a quick and dirty implementation. There are lots of priority queue
> implementations to choose from, and I don't know which one is best
> suited to this kind of task. We could keep the existing sorted
> linked-list, and just do some optimizations for insertion. Or we could
> switch to some sort of tree. The linked-list has the advantage in that I
> suspect that the vast majority of additions will append to the end, and
> pops are always off the front, so well suited to a linked list. A tree
> might become very unbalanced very fast, and require lots of maintenance.

No, trees are quite cheap, we use them absolutely everywhere now. Just
take a look at wake_expired_tasks() to get an idea. In fact every single
time we've been relying on lists doing linear lookups, we've got several
bug reports about haproxy eating all the CPU because some people were
using it in a way that wasn't initially considered as reasonable but
which made a lot of sense to them. In the case of the list above, the
corner case would be server with "maxconn 5000" and half of the requests
having a high priority and the other half having a low one, resulting in
an average of 2500 visits in the optimal case where you can insert from
both sides (which was what we used to do a very long time ago in the
scheduler).

> Using the date() fetch obviously doesn't provide good precision.

Hmmm good point here it's seconds so it will not wrap for the next 20
years but will provide a useless precision.

> We
> could add a `datems()`, `datens()`, or `ticks()` (ms since start) sample
> fetch. But obviously we would need to use `long long` for that, Though
> ticks could use a `long`, but would wrap at around 16 months.

No, 49.7 days ;-)

The other problem I'm having here is that it artificially mixes two types
of totally unrelated data and relies on the lack of intersection between
their domains (when time doesn't wrap). Indeed, here you're considering
that values below a certain number cannot be dates so they definitely are
fixed priorities. But that also means that people who want to use ranges
will randomly collide with dates and will observe a changing behaviour
along the year.

In this case you'd be better of with the principle consisting in using
a larger integer with the priority at the top and the date at the bottom.
But that's where I said that it would require two lookups to deal with
wrapping date so I'm not sure it's any better than using multiple queues.
Well, at least it's less maintenance burden. The detailed principle is
the following :

   uint64_t priority :

        63            32 31             0
        [ hard priority |     date       ]

   next = lookup_first(queue);
   if (!next)
      return;
 
   => next is the lowest value of both priority and date

Then you perform a second lookup with the oldest past date for the same
priority level :

   key = (next->key & 0xffffffff00000000ULL) | (now_ms - TIMER_LOOK_BACK);
   next2 = lookup_ge(queue, key);
   if (next2 && !((next2->value ^ next->value) >> 32))
      next=next2

Then you dequeue next which is always the next one.

That's where I thought that using two distinct levels was simpler, but
if we factor in the fact that it would limit the number of priority
classes and would require more maintenance code, it's probably not
worth the hassle to save the 4 extra lines we have above to respect
the wrapping date.

Also I'm thinking that we can even use 32-bit by putting the frontier
between the date and the fixed priority (let's call it class) somewhere
else :
  - 8 bit class + 24 bit offset  => 256 classes and +/- 2.3 hours offsets
  - 12 bit class + 20 bit offset => 4096 classes and +/- 8 minutes offsets
  - 16 bit class + 16 bit offset => 64k classes and +/- 32 second offsets

I'm pretty sure that the two first options are way enough to cover almost
everyone's needs. Let's say for a moment that we'd use the second option
(12+20 bits), the queue insertion code becomes this :

   pendconn->node.key = (stream->prio_class << 20) +
                        (now_ms + stream->prio_offset) & 0xFFFFF;
   if (srv)
        eb32_insert(&srv->pendconns, &pendconn->node);
   else
        eb32_insert(&px->pendconns, &pendconn->node);


And the dequeuing code in pendconn_process_next_strm() which takes care
both of the backend's and server's queues while still respecting their
classes would become (modulo locking, which is not trivial there but
which has to be preserved regardless of the solution) :

   srv_node = eb32_first(&srv->pendconns);
   prx_node = eb32_first(&prx->pendconns);
   if (!srv_node)
      return prx_node;
   if (!prx_node)
      return srv_node;

   if (srv_node->key >> 20 < prx_node->key >> 20)
      return srv_node;

   if (srv_node->key >> 20 > prx_node->key >> 20)
      return prx_node;

   /* classes are equal, pick oldest expiration first */
   key = (srv_node->key & 0xfff00000) +
         (now_ms - (TIMER_LOOK_BACK >> 12)) & 0xFFFFF;

   srv_node2 = eb32_lookup_ge(&srv->pendconns, key);
   if (srv_node2 && !((srv_node2->value ^ srv_node->value) >> 20))
         srv_node = srv_node2;

   prx_node2 = eb32_lookup_ge(&prx->pendconns, key);
   if (prx_node2 && !((prx_node2->value ^ prx_node->value) >> 20))
         prx_node = prx_node2;

   /* take the first one in time order */
   if ((prx_node->key - srv_node->key) & 0xFFFFF < 0x80000)
       return srv_node;
   else
       return prx_node;

All of this looks fairly reasonable to me and could possibly remove a
bit of the current complexity in the current function which has to
dereference the streams to find the queuing dates.

Then the documentation becomes pretty simple :
   - all requests from the lowest numbered priority classes are always
     processed first
   - within a same priority class, requests are processed in order of
     arrival, to which a positive or negative time offset an be applied

We could then have this in http-request and tcp-request :

    http-request set-priority-class int(x)
    http-request set-priority-offset int(x)

BTW you were right to remind me about using a sample expression instead
of a static integer in the argument. I've been used to use static values
everywhere for years but for such cases, being able to use expressions
is much more useful as we could extract the value from a header or cookie
for example.

Willy

Re: Priority based queuing

Reply via email to