On Sun, May 11, 2014 at 7:11 AM, Alex Elder <[email protected]> wrote:
> On 05/10/2014 05:18 PM, Hannes Landeholm wrote:
>> Hello,
>>
>> I have a development machine that I have been running stress tests on
>> for a week as I'm trying to reproduce some hard to reproduce failures.
>> I've mentioned the same machine previously in the thread "rbd unmap
>> deadlock". I just now noticed that some processes had completely
>> stalled. I looked in the system log and saw this crash about 9 hours
>> ago:
>
> Are you still running kernel rbd as a client of ceph
> services running on the same physical machine?
>
> I personally believe that scenario may be at risk of
> deadlock in any case--we haven't taken great care to
> avoid it in this case.
>
> Anyway...
>
> I can build v3.14.1 but I don't know what kernel configuration
> you are using. Knowing that could be helpful. I built it using
> a config I have though, and it's *possible* you crashed on
> this line, in rbd_segment_name():
> ret = snprintf(name, CEPH_MAX_OID_NAME_LEN + 1, name_format,
> rbd_dev->header.object_prefix, segment);
> And if so, the only reason I can think that this failed is if
> rbd_dev->header.object_prefix were null (or an otherwise bad
> pointer value). But at this point it's a lot of speculation.
More precisely, it crashed on
segment = offset >> rbd_dev->header.obj_order;
while loading obj_order. rbd_dev is ffff87ff3fbcdc00, which suggests
a use after free of some sort. (This is the first rbd_dev deref after
grabbing it from img_request at the top of rbd_img_request_fill(),
which got it from request_queue::queuedata in rbd_request_fn().)
Thanks,
Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html