On Tue, 17 May 2011, Sage Weil wrote:
> On Tue, 17 May 2011, Jim Schutt wrote:
> > Hi Sage,
> >
> > Sage Weil wrote:
> > > Hi Jim,
> >
> > > This picks up the second resent message. But and OPENING is set.. so how
> > > is
> > > it that we were in try_write but we didn't pass through the OPENING test
> > > in
> > > con_work?
> > >
> > > > > > con_work()/try_write() start again on new message.
> > > > > > Note that con_work() did not notice that the connection was marked
> > > > > > to
> > > > > > be closed/reopened! This is where things went bad.
> > > > > > Note the connection state: 8220 = 0x201c, and 0x2000 is OPENING.
> > > > [ 2412.643178] libceph: messenger.c:1969 : con_work ffff8801955c0030
> > > > OPENING
> > > > [ 2412.643182] libceph: messenger.c:292 : con_close_socket on
> > > > ffff8801955c0030 sock ffff8801825a8c00
> > > > [ 2412.643197] libceph: messenger.c:162 : ceph_state_change
> > > > ffff8801955c0030 state = 2076 sk_state = 4
> > >
> > > Ok, now we see it, but this should have executed before we ever got to
> > > try_write!
> > >
> > > There are a handful of places that drop and re-take con->mutex... I think
> > > it
> > > has to be one of those. They are:
> > >
> > > - prepare_connect_authorizer (before calling ->prepare_authorizer
> > > callback)
> > > - fail_protocol (before calling ->bad_proto callback)
> > > - process_connect RESETSESSION case (before calling ->reset callback)
> > > - process_message (before calling ->dispatch callback)
> > >
> > > I think after retaking the lock in each of these cases we need to test for
> > > the CLOSED or OPENING bits and, if set, bail out so that con_work can
> > > restart.
> > >
> > > Does that make sense? Can you confirm which case of lock droppage inside
> > > con_work is at fault? (Maybe adding a printk at the top of con_work after
> > > taking the lock too would make it more clear what's going on; currently it
> > > only prints in all the random test_bit cases).
> >
> > So I've got logs now that show that connection state changes
> > across a call to process_message. That seems like it proves
> > the concept.
> >
> > >
> > > I'm thinking something like the below... does that seem reasonable? I
> > > only
> > > did the prepare_authorizer case to start, but the others should be fixed
> > > up
> > > too. Compile tested only.
> > >
> >
> > Is there any reason not to fix all the cases at once, in try_read?
> > It's less code, but far away from where the mutex dropping occurs,
> > so we add a comment:
>
> That covers the process_message and process_connect cases, provided you
> also set ret properly, and do the con_close_socket work that the con_work
> checks would have done. e.g.,
I pushed a patch to the msgr_race branch that catches all four cases (I
think). Does the fix make sense given what you saw?
Thanks!
sage
>
> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
> index e15a82c..7ef9eb0 100644
> --- a/net/ceph/messenger.c
> +++ b/net/ceph/messenger.c
> @@ -1810,6 +1810,21 @@ static int try_read(struct ceph_connection *con)
> more:
> dout("try_read tag %d in_base_pos %d\n", (int)con->in_tag,
> con->in_base_pos);
> +
> + /*
> + * process_connect and process_message drop and re-take
> + * con->mutex. make sure we handle a racing close or reopen.
> + */
> + if (test_bit(CLOSED, &con->state)) {
> + ret = -EAGAIN;
> + goto out;
> + }
> + if (test_bit(OPENING, &con->state)) {
> + con_close_socket(con);
> + ret = 0; /* so we continue to try_write */
> + goto out;
> + }
> +
> if (test_bit(CONNECTING, &con->state)) {
> if (!test_bit(NEGOTIATING, &con->state)) {
> dout("try_read connecting\n");
>
> It's a little bit unweildly because we repeat the checks at the top of
> con_work, though. It is probably cleaner to always fall back to that case
> by doing CLOSED|OPENING bit checks and bailing with -EAGAIN any time we
> retake con->mutex, and then checking try_read/write return values like the
> patch from yesterday does. That way we only react to the bits in one
> place.
>
> Have you verified that this fixes the problem for you? It makes sense the
> race is over ->dispatch, since that is a somewhat long-running call.
>
> sage
>
>
>
> >
> > diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
> > index a9bd424..11d6a61 100644
> > --- a/net/ceph/messenger.c
> > +++ b/net/ceph/messenger.c
> > @@ -1816,10 +1820,19 @@ static int try_read(struct ceph_connection *con)
> > dout("try_read start on %p state %lu\n", con, con->state);
> >
> > more:
> > dout("try_read %p state %lu tag %d in_base_pos %d\n", con,
> > con->state, (int)con->in_tag, con->in_base_pos);
> > +
> > + /* process_connect, process_message need to drop con->mutex,
> > + * so connection state might have changed on us. Check it,
> > + * leave if necessary so new state can be processed.
> > + */
> > + if (test_bit(CLOSED, &con->state) ||
> > + test_bit(OPENING, &con->state))
> > + goto out;
> > +
> > if (test_bit(CONNECTING, &con->state)) {
> > if (!test_bit(NEGOTIATING, &con->state)) {
> > dout("try_read %p connecting\n", con);
> > ret = read_partial_banner(con);
> > if (ret <= 0)
> >
> >
> > What am I missing?
> >
> > -- Jim
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html