Hello,

I am running the following:

ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
ubuntu 14.04 with kernel 3.19.0-49-generic #55~14.04.1-Ubuntu SMP

For this use case I am mapping and mounting an rbd using the kernel client
and exporting the ext4 filesystem via NFS to a number of clients.

Once or twice a week we've seen disk io "stuck" or "blocked" on the rbd
device. When this happens iostat shows avgqu-sz at a constant number with
utilization at 100%. All i/o operations via NFS blocks, though I am able to
traverse the filesystem locally on the nfs server and read/write data. If I
wait long enough the device will eventually recover and avgqu-sz goes to
zero.

The only issue I could find that was similar to this is:
http://tracker.ceph.com/issues/8818 - However, I am not seeing the error
messages described and I am running a more recent version of the kernel
that should contain the fix from that issue. So, I assume this is likely a
different problem.

The ceph cluster reports as healthy the entire time, all pgs up and in,
there was no scrubbing going on, no osd failures or anything like that.

I ran echo t > /proc/sysrq-trigger and the output is here:
https://gist.github.com/anonymous/89c305443080149e9f45

 Any ideas on what could be going on here? Any additional information I can
provide?

Thanks,
Randy Orr
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to