Re: [PATCH 1/2] MEDIUM: add set-priority-class and set-priority-offset

Willy Tarreau Wed, 30 May 2018 21:57:45 -0700

Hi Patrick,

On Thu, May 31, 2018 at 12:16:27AM -0400, Patrick Hemmer wrote:
> > I looked at the code to see if something could cause that. I found that the
> > key increment could be a reason (you must restart from the next element,
> > not an upper value since there will be many duplicate keys) 
> Gah, I completely forgot about duplicate keys. Will fix.


I'll send you (later today) some splitted patches that will help for
this. This simplifies review and experimentations on the walk algorithm.

> The warning is addressable. It means the user should add a `timeout
> queue` and set it less than 524287ms. This is similar to the "missing
> timeouts for frontend/backend" warning we print. Though I think it would
> make sense to add "queue" to that existing warning instead, as it
> already mentions the timeouts for client, connect & server.

In fact the problem I'm having is that I've already seen some very high
timeout values in field (eg on a print server), which means that some
people do need to have a large value here. For sure they don't need to
reach 24 days but 8min will definitely be too short for certain extreme
cases. And I understand the impact of such large values with the queue,
I just don't know how we can address this at the moment, it's what I'm
still thinking about.

> > We need to find a more suitable name for this "cntdepend" as it really 
> > doesn't
> > tell me that it has anything to do with what it says. I don't have anything
> > to propose for now I'm afraid.
> Yeah, not fond of the name, but it was something, and is easy to change.
> The name originated from "cnt"=counter, and "pend" to be consistent with
> the naming of "nbpend" and "totpend" right above it. So: counter of
> de-queued pending connections.

OK. In fact given that it's a wrapping counter it's why I considered it
could be an index for the srv and px parts, but the stream part stores
a distance between the recorded index and the current index.

> > @@ -2467,6 +2469,10 @@ struct task *process_stream(struct task *t, void 
> > *context, unsigned short state)
> >                 return t; /* nothing more to do */
> >         }
> >  
> > +       // remove from pending queue here so we update counters
> > +       if (s->pend_pos)
> > +               pendconn_free(s->pend_pos);
> > +
> >         if (s->flags & SF_BE_ASSIGNED)
> >                 HA_ATOMIC_SUB(&s->be->beconn, 1);
> >
> > This part is not supposed to be needed since it's already performed
> > in stream_free() a few lines later. Or if it's required to get the
> > queue values correctly before logging, then something similar will
> > be required at every place where we log after trying to connect to
> > a server (there are a few places depending on the log format and
> > the logasap option).
> Yes, it was put there to update the counters before logging. Re-thinking
> this, updating of the counters needs to be moved into
> pendconn_process_next_stream anyway. Since this function updates
> px->cntdepend and is called in a loop, calculating logs.prx_queue_pos
> after this loop completes will yield incorrect values.

I thought about the same but was not yet completely certain. I suspect in
fact the stream's value should indeed be updated upon dequeue, and that
only the error case has to be taken into account before logging (if there
was no dequeue). In this case I think we can have this one performed here
provided we ensure other log places will not be triggered on error.

> > Now I think I understand how the cntdepend works : isn't it simply a
> > wrapping queue index that you use to measure the difference between
> > when you queued and when you dequeued ? In this case, wouldn't something
> > like "queue_idx" be more explicit ? At least it becomes more obvious when
> > doing the measure that we measure a length by subtracting two indexes.
> It is a measure of a difference yes, but it's the difference between how
> many items were processed off the queue. I personally wouldn't call it
> an index as index implies position.

I personally see it as a position. Well exactly a sequence number in fact :-)

> `px->cntdepend` isn't a position of
> anything, but is a counter of processed connections.

Except that it's a wrapping counter, and I'd hate to be tempted to use it as
a counter with the same meaning as what we use in stats.

> `strm->cntdepend`
> isn't a position either, but a snapshot of the px->cntdepend value when
> it was queued. When you combine the 2 you get a position.
(...)
> Did you have any thoughts on the 64-bit absolute timestamp patch? It was
> my understanding that the earlier objections were due to anticipated
> performance and complexity issues. But the performance is better, the
> code is simpler, and has less issues (queue timeout, wrapping at extreme
> offset values, etc).

Yes I had some thoughts but mixed. It's indeed simpler, but from the beginning
there is something I don't like in it that I'm totally unable to explain. I
really hate this because I know it feels totally unfair but I'm really
convinced that taking it this way we'll regret it later. I'm a bit embarrassed
and have been thinking for some time what bothers me but I don't know very
well. The root cause of the problem is changing the way the timers work.
Everywhere we use a perfectly fine wrapping timer that can run continuously
and ensures that bugs (if any) are quickly met and resolved. Here with a
non-wrapping time, we more or less consider that given that we'll all be
dead when it wraps we don't care about what happens. And this "don't care"
approach really doesn't appeal me because we may overlook serious problems
that will trigger much earlier than expected, or could depend on the platform
or anything. You know, a bit like people using "9/9/99" as infinite time 25
years ago, until their applications stopped on this date, or long-term
contracts were dropped or anything.

I don't like the extra cost of 64-bit processing for 32-bit platforms but I
also know that we care much less about 32-bit platforms when it comes to
performance, so if technically we need it I'm OK with it. I don't either
like the *1000 and /1000 done there, adding a (small) cost to the processing
but that can be addressed as well if we really need this. I just really want
to ensure everything works fine by construction and not just because we're
in a certain window of time which ensures that numbers assemble well together.

> I'll try to get to submitting a revised patch set in the next couple days.

I'll send you privately the split-up of your patch set that I did yesterday,
it will help you and will be better for me once you send it back. I can even
merge some parts of it already, those which are probably definitive.

Thanks,
Willy

Re: [PATCH 1/2] MEDIUM: add set-priority-class and set-priority-offset

Reply via email to