On 05/11/2014 04:33 AM, Ilya Dryomov wrote:
> On Sun, May 11, 2014 at 7:11 AM, Alex Elder <[email protected]> wrote:
>> On 05/10/2014 05:18 PM, Hannes Landeholm wrote:
>>> Hello,
>>>
>>> I have a development machine that I have been running stress tests on
>>> for a week as I'm trying to reproduce some hard to reproduce failures.
>>> I've mentioned the same machine previously in the thread "rbd unmap
>>> deadlock". I just now noticed that some processes had completely
>>> stalled. I looked in the system log and saw this crash about 9 hours
>>> ago:
>>
>> Are you still running kernel rbd as a client of ceph
>> services running on the same physical machine?
>>
>> I personally believe that scenario may be at risk of
>> deadlock in any case--we haven't taken great care to
>> avoid it in this case.
>>
>> Anyway...
>>
>> I can build v3.14.1 but I don't know what kernel configuration
>> you are using. Knowing that could be helpful. I built it using
>> a config I have though, and it's *possible* you crashed on
>> this line, in rbd_segment_name():
>> ret = snprintf(name, CEPH_MAX_OID_NAME_LEN + 1, name_format,
>> rbd_dev->header.object_prefix, segment);
>> And if so, the only reason I can think that this failed is if
>> rbd_dev->header.object_prefix were null (or an otherwise bad
>> pointer value). But at this point it's a lot of speculation.
>
> More precisely, it crashed on
>
> segment = offset >> rbd_dev->header.obj_order;
After looking more closely at this tonight I can say I concur.
kernel: BUG: unable to handle kernel paging request at ffff87ff3fbcdc58
RAX: ffff87ff3fbcdc00
2483: 00 00 00 be movzbl 0x58(%rax),%ecx
Unfortunately that's about all I can say right now.
Since the stack includes rbd_request_fn() we know it's a
request that came from the block layer--which means that
the rbd_img_request_create() call was not being done for
a parent image request. On the other hand, if you're right
about use-after-free, it could still involve an image request
created through that path through the code (if a parent image
request were freed while it was still in use).
Hannes indicated layered images were involved.
More later...
-Alex
> while loading obj_order. rbd_dev is ffff87ff3fbcdc00, which suggests
> a use after free of some sort. (This is the first rbd_dev deref after
> grabbing it from img_request at the top of rbd_img_request_fill(),
> which got it from request_queue::queuedata in rbd_request_fn().)
>
> Thanks,
>
> Ilya
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html