Re: [PATCH 1/2] MEDIUM: add set-priority-class and set-priority-offset

Patrick Hemmer Wed, 30 May 2018 21:17:28 -0700


On 2018/5/30 04:00, Willy Tarreau wrote:
> Hi Patrick,
>
> I'm finally back on this.
>
> On Mon, May 14, 2018 at 01:05:03AM -0400, Patrick Hemmer wrote:
>> On 2018/5/11 12:52, Patrick Hemmer wrote:
>>> This adds the set-priority-class and set-priority-offset actions to
>>> http-request and tcp-request content.
>>> The priority values are used when connections are queued to determine
>>> which connections should be served first. The lowest priority class is
>>> served first. When multiple requests from the same class are found, the
>>> earliest (according to queue_time + offset) is served first.
>>> ---
>>>  doc/configuration.txt  |  38 ++++++
>>>  doc/lua-api/index.rst  |  18 +++
>>>  include/types/proxy.h  |   3 +-
>>>  include/types/queue.h  |   2 +-
>>>  include/types/server.h |   3 +-
>>>  include/types/stream.h |   7 +-
>>>  src/cfgparse.c         |  15 +++
>>>  src/hlua.c             |  74 ++++++++---
>>>  src/log.c              |   4 +-
>>>  src/proto_http.c       |   4 +-
>>>  src/proxy.c            |   2 +-
>>>  src/queue.c            | 345
>>> ++++++++++++++++++++++++++++++++++++++++++-------
>>>  src/server.c           |   2 +-
>>>  src/stream.c           |  10 +-
>>>  14 files changed, 453 insertions(+), 74 deletions(-)
>>>
>>>
>> I found one issue I'll need to fix. During final code cleanup I
>> accidentally replaced good code with the `key_incr` function in 2 places
>> (pendconn_redistribute and pendconn_grab_from_px). Will fix and submit
>> new patch pending feedback.
> So I spent some time testing the code yesterday. Overall it works pretty
> well, and allows to maintain very low response times for certain requests
> while others are higher (when using the class) or to finely adjust the
> response time distribution between requests.
>
> I found that using classes under high loads result in the highest priority
> requests to totally exclude the other ones, which is expected, while offsets
> provide a smoother transition but may not fit all purposes either.
>
> I noticed a strange effect which is that when injecting under low load with
> a higher priority (either offset or class) than another high level traffic,
> the response time on the higher priority traffic follows a sawtooth shape,
> it progressively raises from 0 to 50-80ms and suddenly drops to zero again.
>
> I looked at the code to see if something could cause that. I found that the
> key increment could be a reason (you must restart from the next element,
> not an upper value since there will be many duplicate keys) 
Gah, I completely forgot about duplicate keys. Will fix.


> but it doesn't
> change anything. It could be something in the lower layers as well, I don't
> really know for now. At least what I've seen is that the server's response
> time when attacked directly, is stable. It's very hard to say if this is
> due to the change or not given that it was not possible to set priorities
> in the past!
>
> By the way, regarding this "restart from next" thing, I looked at the locking
> and the element you find is perfectly safe to be used to get the next one. So
> the tree progressing can definitely restart using eb32_next(node), modulo the
> wrapping going back to eb32_first() as we do in the scheduler. I'll see if I
> can propose you something around this.
>
> I found some minor suggestions that I can take care of, like this :
>
> @@ -125,6 +125,9 @@ struct stream {
>  
>         struct server *srv_conn;        /* stream already has a slot on a 
> server and is not in queue */
>         struct pendconn *pend_pos;      /* if not NULL, points to the pending 
> position in the pending queue */
> +       int16_t priority_class;         /* priority class of the stream for 
> the pending queue */
> +       int32_t priority_offset;        /* priority offset of the stream for 
> the pending queue */
> +       unsigned int cntdepend;         /* value of proxy/server cntdepend at 
> time of enqueue */
>  
>         struct http_txn *txn;           /* current HTTP transaction being 
> processed. Should become a list. */
>
> This is creating a 48-bit hole between the two fields, I'll check if there's
> a nearby 16-bit hole to place the priority_class entry. Otherwise I'll add
> a comment to mention there's a hole there.
>
> --- a/src/cfgparse.c
> +++ b/src/cfgparse.c
> @@ -8131,6 +8131,21 @@ out_uri_auth_compat:
>                         }
>                 }
>  
> +               if ((curproxy->mode == PR_MODE_TCP || curproxy->mode == 
> PR_MODE_HTTP) &&
> +                   (curproxy->cap & PR_CAP_BE) && (curproxy->srv) &&
> +                    (!curproxy->timeout.queue || curproxy->timeout.queue > 
> (((TIMER_LOOK_BACK >> 12) & 0xfffff) / 2))) {
> +                       // Note that this warning isn't comprehensive.
> +                       // if the user specifies set-priority-offset > 
> 'timeout queue`, wrapping
> +                       // may occur and get de-queued out of order. But 
> logging this every
> +                       // single time might be too noisy.
> +                       ha_warning("config : excessive timeout queue for 
> backend '%s'.\n"
> +                                  "   | The 'timeout queue' setting is 
> either missing, or exceeds the maximum\n"
> +                                  "   | priority offset. If a request sits 
> in the queue for longer than the maximum\n"
> +                                  "   | priority offset, it may get 
> de-queued out of order.\n",
> +                                  curproxy->id);
> +                       err_code |= ERR_WARN;
> +               }
> +
>                 if ((curproxy->options2 & PR_O2_CHK_ANY) == PR_O2_SSL3_CHK) {
>                         curproxy->check_len = sizeof(sslv3_client_hello_pkt) 
> - 1;
>                         curproxy->check_req = malloc(curproxy->check_len);
>
> This should be part of another commit as it will possibly cause warnings
> with currently valid configurations. Also, given that there is possibly
> nothing the user can do about it, I'm really not sure that a warning is
> welcome there. The purpose of warnings is always to encourage users to
> fix them. This would be pretty fine for a diagnostic level though (which
> sadly we still don't have).
The warning is addressable. It means the user should add a `timeout
queue` and set it less than 524287ms. This is similar to the "missing
timeouts for frontend/backend" warning we print. Though I think it would
make sense to add "queue" to that existing warning instead, as it
already mentions the timeouts for client, connect & server.

>
> For all the bounds set on the class and offset values, it would be better
> to have some functions than to perform these checks everywhere :
>
>       static inline int queue_adjust_class(int class)
>       {
>               if (class < -0x7ff)
>                       class = -0x7ff;
>               else if (class > 0x7ff)
>                       class = 0x7ff;
>               return class;
>       }
>
>       static inline int queue_adjust_offset(int offset)
>       {
>               if (offset < -0x7ffff)
>                       offset = -0x7ffff;
>               else if (offset > 0x7ffff)
>                       offset = 0x7ffff;
>               return offset;
>       }
>
>
> --- a/include/types/proxy.h
> +++ b/include/types/proxy.h
> @@ -322,9 +322,10 @@ struct proxy {
>                 int serverfin;                  /* timeout to apply to server 
> half-closed connections */
>         } timeout;
>         char *id, *desc;                        /* proxy id (name) and 
> description */
> -       struct list pendconns;                  /* pending connections with 
> no server assigned yet */
> +       struct eb_root pendconns;               /* pending connections with 
> no server assigned yet */
>         int nbpend;                             /* number of pending 
> connections with no server assigned yet */
>         int totpend;                            /* total number of pending 
> connections on this instance (for stats) */
> +       unsigned int cntdepend;                 /* number of pending 
> connections which have been de-queued */
>         unsigned int feconn, beconn;            /* # of active frontend and 
> backends streams */
>         struct freq_ctr fe_req_per_sec;         /* HTTP requests per second 
> on the frontend */
>         struct freq_ctr fe_conn_per_sec;        /* received connections per 
> second on the frontend */
>
> We need to find a more suitable name for this "cntdepend" as it really doesn't
> tell me that it has anything to do with what it says. I don't have anything
> to propose for now I'm afraid.
Yeah, not fond of the name, but it was something, and is easy to change.
The name originated from "cnt"=counter, and "pend" to be consistent with
the naming of "nbpend" and "totpend" right above it. So: counter of
de-queued pending connections.

>
>
> --- a/include/types/stream.h
> +++ b/include/types/stream.h
> @@ -105,8 +105,8 @@ struct strm_logs {
>         long  t_connect;                /* delay before the connect() to the 
> server succeeds, -1 if never occurs */
>         long  t_data;                   /* delay before the first data byte 
> from the server ... */
>         unsigned long t_close;          /* total stream duration */
> -       unsigned long srv_queue_size;   /* number of streams waiting for a 
> connect slot on this server at accept() time (in di
> -       unsigned long prx_queue_size;   /* overall number of streams waiting 
> for a connect slot on this instance at accept() t
> +       unsigned long srv_queue_pos;    /* number of streams de-queued while 
> waiting for a connection slot on this server */
> +       unsigned long prx_queue_pos;    /* number of streams de-qeuued while 
> waiting for a connection slot on this instance */
>         long long bytes_in;             /* number of bytes transferred from 
> the client to the server */
>         long long bytes_out;            /* number of bytes transferred from 
> the server to the client */
>  };
>
> I found that a non-negligible part of the patch is in fact due only to
> this rename and that needlessly impacts the review or code study (eg if
> we later bisect to this patch when digging an issue). I'm not against
> this rename which apparently makes sense, but we should do it in a first
> patch. In fact it's the same for the cntdepend one above, apparently you
> store a new metric in the proxy which helps calculating the queue size
> differently, and this change is independant on the offset+class change,
> so let's do it separately as well. If we later notice a change in the
> reported queue sizes or any such thing, it will be easier to know if
> it's a different reporting due to the way the measure is now performed
> or if it's a result of the change of algorithm.
Agree.

>
> @@ -2467,6 +2469,10 @@ struct task *process_stream(struct task *t, void 
> *context, unsigned short state)
>                 return t; /* nothing more to do */
>         }
>  
> +       // remove from pending queue here so we update counters
> +       if (s->pend_pos)
> +               pendconn_free(s->pend_pos);
> +
>         if (s->flags & SF_BE_ASSIGNED)
>                 HA_ATOMIC_SUB(&s->be->beconn, 1);
>
> This part is not supposed to be needed since it's already performed
> in stream_free() a few lines later. Or if it's required to get the
> queue values correctly before logging, then something similar will
> be required at every place where we log after trying to connect to
> a server (there are a few places depending on the log format and
> the logasap option).
Yes, it was put there to update the counters before logging. Re-thinking
this, updating of the counters needs to be moved into
pendconn_process_next_stream anyway. Since this function updates
px->cntdepend and is called in a loop, calculating logs.prx_queue_pos
after this loop completes will yield incorrect values.

>
> Now I think I understand how the cntdepend works : isn't it simply a
> wrapping queue index that you use to measure the difference between
> when you queued and when you dequeued ? In this case, wouldn't something
> like "queue_idx" be more explicit ? At least it becomes more obvious when
> doing the measure that we measure a length by subtracting two indexes.
It is a measure of a difference yes, but it's the difference between how
many items were processed off the queue. I personally wouldn't call it
an index as index implies position. `px->cntdepend` isn't a position of
anything, but is a counter of processed connections. `strm->cntdepend`
isn't a position either, but a snapshot of the px->cntdepend value when
it was queued. When you combine the 2 you get a position.

>
> I'm going back to trying to add some detailed timestamps at some points
> in order to figure if we're spending a growing time in the queue or
> anywhere else regarding this strange timer variation thing.
>
> Cheers,
> Willy

Did you have any thoughts on the 64-bit absolute timestamp patch? It was
my understanding that the earlier objections were due to anticipated
performance and complexity issues. But the performance is better, the
code is simpler, and has less issues (queue timeout, wrapping at extreme
offset values, etc).

I'll try to get to submitting a revised patch set in the next couple days.

-Patrick

Re: [PATCH 1/2] MEDIUM: add set-priority-class and set-priority-offset

Reply via email to