1) I did check iostat on all OSDs, and iowait seems normal.
2) ceph -w shows no correlation between high io wait and high iops.
Sometimes the reverse is true: when io wait is high (since it's a cluster
wide thing), the overall ceph iops drops too.
3) We have collectd running in VMs, and that's how we identified the
frequent high io wait. This happens for even lightly used VMs.

Thanks.
-Simon


On Thu, Jun 12, 2014 at 9:26 AM, David <[email protected]> wrote:

> Hi Simon,
>
> Did you check iostat on the OSDs to check their utilization? What does
> your ceph -w say - pehaps you’re maxing your cluster’s IOPS?
> Also, are you running any monitoring of your VMs iostats? We’ve often
> found some culprits overusing IOs..
>
> Kind Regards,
> David Majchrzak
>
> 12 jun 2014 kl. 15:22 skrev Xu (Simon) Chen <[email protected]>:
>
> > Hi folks,
> >
> > We have two similar ceph deployments, but one of them is having trouble:
> VMs running with ceph-provided block devices are seeing frequent high io
> wait, every a few minutes, usually 15-20%, but as high as 60-70%. This is
> cluster-wide and not correlated with VM's IO load. We turned on rbd cache
> and enabled writeback in qemu, but the problem persists. No-deepscrub
> doesn't help either.
> >
> > Without providing any one of our probably wrong theories, any ideas on
> how to troubleshoot?
> >
> > Thanks.
> > -Simon
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to