Re: [PATCH 1/2] MEDIUM: add set-priority-class and set-priority-offset

Willy Tarreau Wed, 30 May 2018 01:01:31 -0700

Hi Patrick,

I'm finally back on this.

On Mon, May 14, 2018 at 01:05:03AM -0400, Patrick Hemmer wrote:
> On 2018/5/11 12:52, Patrick Hemmer wrote:
> > This adds the set-priority-class and set-priority-offset actions to
> > http-request and tcp-request content.
> > The priority values are used when connections are queued to determine
> > which connections should be served first. The lowest priority class is
> > served first. When multiple requests from the same class are found, the
> > earliest (according to queue_time + offset) is served first.
> > ---
> >  doc/configuration.txt  |  38 ++++++
> >  doc/lua-api/index.rst  |  18 +++
> >  include/types/proxy.h  |   3 +-
> >  include/types/queue.h  |   2 +-
> >  include/types/server.h |   3 +-
> >  include/types/stream.h |   7 +-
> >  src/cfgparse.c         |  15 +++
> >  src/hlua.c             |  74 ++++++++---
> >  src/log.c              |   4 +-
> >  src/proto_http.c       |   4 +-
> >  src/proxy.c            |   2 +-
> >  src/queue.c            | 345
> > ++++++++++++++++++++++++++++++++++++++++++-------
> >  src/server.c           |   2 +-
> >  src/stream.c           |  10 +-
> >  14 files changed, 453 insertions(+), 74 deletions(-)
> >
> >
> I found one issue I'll need to fix. During final code cleanup I
> accidentally replaced good code with the `key_incr` function in 2 places
> (pendconn_redistribute and pendconn_grab_from_px). Will fix and submit
> new patch pending feedback.

So I spent some time testing the code yesterday. Overall it works pretty
well, and allows to maintain very low response times for certain requests
while others are higher (when using the class) or to finely adjust the
response time distribution between requests.

I found that using classes under high loads result in the highest priority
requests to totally exclude the other ones, which is expected, while offsets
provide a smoother transition but may not fit all purposes either.

I noticed a strange effect which is that when injecting under low load with
a higher priority (either offset or class) than another high level traffic,
the response time on the higher priority traffic follows a sawtooth shape,
it progressively raises from 0 to 50-80ms and suddenly drops to zero again.

I looked at the code to see if something could cause that. I found that the
key increment could be a reason (you must restart from the next element,
not an upper value since there will be many duplicate keys) but it doesn't
change anything. It could be something in the lower layers as well, I don't
really know for now. At least what I've seen is that the server's response
time when attacked directly, is stable. It's very hard to say if this is
due to the change or not given that it was not possible to set priorities
in the past!

By the way, regarding this "restart from next" thing, I looked at the locking
and the element you find is perfectly safe to be used to get the next one. So
the tree progressing can definitely restart using eb32_next(node), modulo the
wrapping going back to eb32_first() as we do in the scheduler. I'll see if I
can propose you something around this.

I found some minor suggestions that I can take care of, like this :

@@ -125,6 +125,9 @@ struct stream {

        struct server *srv_conn;        /* stream already has a slot on a 
server and is not in queue */
        struct pendconn *pend_pos;      /* if not NULL, points to the pending 
position in the pending queue */
+       int16_t priority_class;         /* priority class of the stream for the 
pending queue */
+       int32_t priority_offset;        /* priority offset of the stream for 
the pending queue */
+       unsigned int cntdepend;         /* value of proxy/server cntdepend at 
time of enqueue */

        struct http_txn *txn;           /* current HTTP transaction being 
processed. Should become a list. */

This is creating a 48-bit hole between the two fields, I'll check if there's
a nearby 16-bit hole to place the priority_class entry. Otherwise I'll add
a comment to mention there's a hole there.

--- a/src/cfgparse.c
+++ b/src/cfgparse.c
@@ -8131,6 +8131,21 @@ out_uri_auth_compat:
                        }
                }

+               if ((curproxy->mode == PR_MODE_TCP || curproxy->mode == 
PR_MODE_HTTP) &&
+                   (curproxy->cap & PR_CAP_BE) && (curproxy->srv) &&
+                    (!curproxy->timeout.queue || curproxy->timeout.queue > 
(((TIMER_LOOK_BACK >> 12) & 0xfffff) / 2))) {
+                       // Note that this warning isn't comprehensive.
+                       // if the user specifies set-priority-offset > 'timeout 
queue`, wrapping
+                       // may occur and get de-queued out of order. But 
logging this every
+                       // single time might be too noisy.
+                       ha_warning("config : excessive timeout queue for 
backend '%s'.\n"
+                                  "   | The 'timeout queue' setting is either 
missing, or exceeds the maximum\n"
+                                  "   | priority offset. If a request sits in 
the queue for longer than the maximum\n"
+                                  "   | priority offset, it may get de-queued 
out of order.\n",
+                                  curproxy->id);
+                       err_code |= ERR_WARN;
+               }
+
                if ((curproxy->options2 & PR_O2_CHK_ANY) == PR_O2_SSL3_CHK) {
                        curproxy->check_len = sizeof(sslv3_client_hello_pkt) - 
1;
                        curproxy->check_req = malloc(curproxy->check_len);

This should be part of another commit as it will possibly cause warnings
with currently valid configurations. Also, given that there is possibly
nothing the user can do about it, I'm really not sure that a warning is
welcome there. The purpose of warnings is always to encourage users to
fix them. This would be pretty fine for a diagnostic level though (which
sadly we still don't have).

For all the bounds set on the class and offset values, it would be better
to have some functions than to perform these checks everywhere :

        static inline int queue_adjust_class(int class)
        {
                if (class < -0x7ff)
                        class = -0x7ff;
                else if (class > 0x7ff)
                        class = 0x7ff;
                return class;
        }

        static inline int queue_adjust_offset(int offset)
        {
                if (offset < -0x7ffff)
                        offset = -0x7ffff;
                else if (offset > 0x7ffff)
                        offset = 0x7ffff;
                return offset;
        }

--- a/include/types/proxy.h
+++ b/include/types/proxy.h
@@ -322,9 +322,10 @@ struct proxy {
                int serverfin;                  /* timeout to apply to server 
half-closed connections */
        } timeout;
        char *id, *desc;                        /* proxy id (name) and 
description */
-       struct list pendconns;                  /* pending connections with no 
server assigned yet */
+       struct eb_root pendconns;               /* pending connections with no 
server assigned yet */
        int nbpend;                             /* number of pending 
connections with no server assigned yet */
        int totpend;                            /* total number of pending 
connections on this instance (for stats) */
+       unsigned int cntdepend;                 /* number of pending 
connections which have been de-queued */
        unsigned int feconn, beconn;            /* # of active frontend and 
backends streams */
        struct freq_ctr fe_req_per_sec;         /* HTTP requests per second on 
the frontend */
        struct freq_ctr fe_conn_per_sec;        /* received connections per 
second on the frontend */

We need to find a more suitable name for this "cntdepend" as it really doesn't
tell me that it has anything to do with what it says. I don't have anything
to propose for now I'm afraid.

--- a/include/types/stream.h
+++ b/include/types/stream.h
@@ -105,8 +105,8 @@ struct strm_logs {
        long  t_connect;                /* delay before the connect() to the 
server succeeds, -1 if never occurs */
        long  t_data;                   /* delay before the first data byte 
from the server ... */
        unsigned long t_close;          /* total stream duration */
-       unsigned long srv_queue_size;   /* number of streams waiting for a 
connect slot on this server at accept() time (in di
-       unsigned long prx_queue_size;   /* overall number of streams waiting 
for a connect slot on this instance at accept() t
+       unsigned long srv_queue_pos;    /* number of streams de-queued while 
waiting for a connection slot on this server */
+       unsigned long prx_queue_pos;    /* number of streams de-qeuued while 
waiting for a connection slot on this instance */
        long long bytes_in;             /* number of bytes transferred from the 
client to the server */
        long long bytes_out;            /* number of bytes transferred from the 
server to the client */
 };

I found that a non-negligible part of the patch is in fact due only to
this rename and that needlessly impacts the review or code study (eg if
we later bisect to this patch when digging an issue). I'm not against
this rename which apparently makes sense, but we should do it in a first
patch. In fact it's the same for the cntdepend one above, apparently you
store a new metric in the proxy which helps calculating the queue size
differently, and this change is independant on the offset+class change,
so let's do it separately as well. If we later notice a change in the
reported queue sizes or any such thing, it will be easier to know if
it's a different reporting due to the way the measure is now performed
or if it's a result of the change of algorithm.

@@ -2467,6 +2469,10 @@ struct task *process_stream(struct task *t, void 
*context, unsigned short state)
                return t; /* nothing more to do */
        }

+       // remove from pending queue here so we update counters
+       if (s->pend_pos)
+               pendconn_free(s->pend_pos);
+
        if (s->flags & SF_BE_ASSIGNED)
                HA_ATOMIC_SUB(&s->be->beconn, 1);

This part is not supposed to be needed since it's already performed
in stream_free() a few lines later. Or if it's required to get the
queue values correctly before logging, then something similar will
be required at every place where we log after trying to connect to
a server (there are a few places depending on the log format and
the logasap option).

Now I think I understand how the cntdepend works : isn't it simply a
wrapping queue index that you use to measure the difference between
when you queued and when you dequeued ? In this case, wouldn't something
like "queue_idx" be more explicit ? At least it becomes more obvious when
doing the measure that we measure a length by subtracting two indexes.

I'm going back to trying to add some detailed timestamps at some points
in order to figure if we're spending a growing time in the queue or
anywhere else regarding this strange timer variation thing.

Cheers,
Willy

Re: [PATCH 1/2] MEDIUM: add set-priority-class and set-priority-offset

Reply via email to