On Thu, Mar 15, 2018 at 11:32:41AM +0100, Christopher Faulet wrote:
> I tested with an ugly and really unsafe/buggy hack to migrate the stream and
> its client FD before waking it up, and this problem disappears. So this is
> definitely the right way to handle it. But for now, this could be a problem
> for everyone using maxconn with threads.

Well, it cannot be worse than the current situation! Also, don't worry, as
in general when working with very low maxconn, you face high wake-up rates
because operations are much less batched, so the workloads exhibiting this
case where some tasks are idle waiting for a connection shouldn't be that
common, and that will buy us time to try to address this better.

> > > +static struct stream *pendconn_process_next_strm(struct server *srv, 
> > > struct proxy *px)
> > >   {
> > > + struct pendconn *p = NULL;
> > > + struct server   *rsrv;
> > >           rsrv = srv->track;
> > >           if (!rsrv)
> > >                   rsrv = srv;
> > > + if (srv->nbpend) {
> > > +         list_for_each_entry(p, &srv->pendconns, list) {

-- mark here --

> > > +                 if (!HA_SPIN_TRYLOCK(PENDCONN_LOCK, &p->lock))
> > > +                         goto ps_found;
> > > +         }
> > > +         p = NULL;
> > > + }
> > > +
> > > +  ps_found:
> > > + if (srv_currently_usable(rsrv) && px->nbpend) {
> > > +         struct pendconn *pp;
> > > +
> > > +         list_for_each_entry(pp, &px->pendconns, list) {
> > > +                 /* If the server pendconn is older than the proxy one,
> > > +                  * we process the server one. */
> > > +                 if (p && !tv_islt(&pp->strm->logs.tv_request, 
> > > &p->strm->logs.tv_request))
> > > +                         goto pendconn_found;
> > 
> > Here there is still a race : pp->strm is dereferenced out of the lock,
> > so the owner could very well decide to disappear in the mean time. I
> > think this check needs to be moved in the trylock below. Or another
> > option (probably simpler and cleaner) would be to add a tv_request
> > field to the pendconn, and directly use it.
> 
> Well, not really. When we access to a pendconn in a queue (proxy or server)
> we always get a lock on it. In other side, a stream must release its
> pendconn, if any, before being released. To do so the right queue must be
> locked by the stream. So when we manipulate a pendconn in a queue we are
> sure that the stream cannot release it in same time. And if the stream
> released it, it cannot be in the queue.
> 
> Everything should be thread-safe because I added an lock on the pendconn
> _AND_ the pendconn is now only released by the stream itself. These 2
> conditions are necessary and sufficient to guarantee the thread-safety.

OK that's fine. I think that you should indicate for certain functions
that they must only be called under this or that lock (I remember seeing
one this morning in this situation, I don't remember which one).

But then what prevents a stream from removing itself from the queue at
the mark above ? It takes the lock, removes itself, unlocks, but <p> is
still seen by the function, the trylock succeeds (on just freed memory),
the stream is dereferenced again after being freed, and we may even
unlink it twice, immediately corrupting memory. I may be wrong but I
don't see what prevents this case from being possible. If it is, you may
need to use another lock when manipulating this list (likely the server
lock as you initially did), though that could become expensive.

OK for the rest :-)

Willy

Reply via email to