On Tue, Jul 28, 2015 at 7:20 PM, van <[email protected]> wrote:
>
>> On Jul 28, 2015, at 7:57 PM, Ilya Dryomov <[email protected]> wrote:
>>
>> On Tue, Jul 28, 2015 at 2:46 PM, van <[email protected]> wrote:
>>> Hi, Ilya,
>>>
>>> In the dmesg, there is also a lot of libceph socket error, which I think
>>> may be caused by my stopping ceph service without unmap rbd.
>>
>> Well, sure enough, if you kill all OSDs, the filesystem mounted on top
>> of rbd device will get stuck.
>
> Sure it will get stuck if osds are stopped. And since rados requests have
> retry policy, the stucked requests will recover after I start the daemon
> again.
>
> But in my case, the osds are running in normal state and librbd API can
> read/write normally.
> Meanwhile, heavy fio test for the filesystem mounted on top of rbd device
> will get stuck.
>
> I wonder if this phenomenon is triggered by running rbd kernel client on
> machines have ceph daemons, i.e. the annoying loopback mount deadlock issue.
>
> In my opinion, if it’s due to the loopback mount deadlock, the OSDs will
> become unresponsive.
> No matter the requests are from user space requests (like API) or from kernel
> client.
> Am I right?
Not necessarily.
>
> If so, my case seems to be triggered by another bug.
>
> Anyway, it seems that I should separate client and daemons at least.
Try 3.18.19 if you can. I'd be interested in your results.
Thanks,
Ilya
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com