On Tue, Jul 28, 2015 at 7:20 PM, van <[email protected]> wrote:
>
>> On Jul 28, 2015, at 7:57 PM, Ilya Dryomov <[email protected]> wrote:
>>
>> On Tue, Jul 28, 2015 at 2:46 PM, van <[email protected]> wrote:
>>> Hi, Ilya,
>>>
>>>  In the dmesg, there is also a lot of libceph socket error, which I think
>>> may be caused by my stopping ceph service without unmap rbd.
>>
>> Well, sure enough, if you kill all OSDs, the filesystem mounted on top
>> of rbd device will get stuck.
>
> Sure it will get stuck if osds are stopped. And since rados requests have 
> retry policy, the stucked requests will recover after I start the daemon 
> again.
>
> But in my case, the osds are running in normal state and librbd API can 
> read/write normally.
> Meanwhile, heavy fio test for the filesystem mounted on top of rbd device 
> will get stuck.
>
> I wonder if this phenomenon is triggered by running rbd kernel client on 
> machines have ceph daemons, i.e. the annoying loopback mount deadlock issue.
>
> In my opinion, if it’s due to the loopback mount deadlock, the OSDs will 
> become unresponsive.
> No matter the requests are from user space requests (like API) or from kernel 
> client.
> Am I right?

Not necessarily.

>
> If so, my case seems to be triggered by another bug.
>
> Anyway, it seems that I should separate client and daemons at least.

Try 3.18.19 if you can.  I'd be interested in your results.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to