I think I've got it.  I'll explain, and we'll have to look at
the explanation more closely in the morning.

Bottom line is that I think the assertion is bogus.  Like
the other assertion that was not done under protection of
a lock, this one is being done in a way that's not safe.

First, here's a patch that I think might avoid the problem:

----------------------------------
Index: b/drivers/block/rbd.c
===================================================================
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -2130,9 +2130,8 @@ static void rbd_img_obj_callback(struct
        rbd_assert(which < img_request->obj_request_count);

        spin_lock_irq(&img_request->completion_lock);
-       if (which > img_request->next_completion)
+       if (which != img_request->next_completion)
                goto out;
-       rbd_assert(which == img_request->next_completion);

        for_each_obj_request_from(img_request, obj_request) {
                rbd_assert(more);
----------------------------------




Here's what I think is happening.  I'll annotate the function,
below.

First, we enter this function when an object request has
been marked "done."  In the case we're looking at, we
have an image request with two (or more) object requests.

static void rbd_img_obj_callback(struct rbd_obj_request *obj_request)
{
        struct rbd_img_request *img_request;
        u32 which = obj_request->which;
        bool more = true;

        rbd_assert(obj_request_img_data_test(obj_request));
        img_request = obj_request->img_request;

        dout("%s: img %p obj %p\n", __func__, img_request, obj_request);
        rbd_assert(img_request != NULL);
        rbd_assert(img_request->obj_request_count > 0);
        rbd_assert(which != BAD_WHICH);
        rbd_assert(which < img_request->obj_request_count);

Up to here, all is fine.  Well, *maybe*...  Anyway, I'll
work through that in the morning.  In any case, we are
examining fairly static fields in the object request.

        spin_lock_irq(&img_request->completion_lock);
        if (which > img_request->next_completion)
                goto out;

At this point we decide whether the object request now
completing is the next one.  If it's not, we're done;
whenever an earlier request (the first one) completes
it will ensure this one will get completed as well,
below.

But that's a problem.  In the loop below, the only condition
we're testing in order to account the completion of the
current *and subsequent* requests is whether each request
is marked "done."

What that means is that while the second request is waiting
to get the spinlock, the first one might be concurrently
going through the loop below.  It finds the second one "done"
and records that fact.  When it finishes the loop, it
updates the next_completion value and releases the lock.

The second request then enters the protected area, and
finds that its "which" value is *not* greater than the
next completion value.  It's in fact *less* than the
next_completion value, because the completion of this
second request has already been processed.

The problem got worse when we moved the spinlock (i.e.,
added protection around checking the next_completion
field) because now the second thread is actually waiting
before checking, so it's pretty much guaranteed it will
trigger the failure.

OK, back to bed.

                                        -Alex

        rbd_assert(which == img_request->next_completion);

        for_each_obj_request_from(img_request, obj_request) {
                rbd_assert(more);
                rbd_assert(which < img_request->obj_request_count);

                if (!obj_request_done_test(obj_request))
                        break;
                more = rbd_img_obj_end_request(obj_request);
                which++;
        }

        rbd_assert(more ^ (which == img_request->obj_request_count));
        img_request->next_completion = which;
out:
        spin_unlock_irq(&img_request->completion_lock);

        if (!more)
                rbd_img_request_complete(img_request);
}
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to