We actually disabled swap all together on these machines...
On Thu, Jun 12, 2014 at 5:06 PM, Gregory Farnum <[email protected]> wrote: > To be clear, that's the solution to one of the causes of this issue. > The log message is very general, and just means that a disk access > thread has been gone for a long time (15 seconds, in this case) > without checking in (so usually, it's been inside of a read/write > syscall for >=15 seconds). > Other causes include simple overload of the OSDs in question, or a > broken local filesystem, or... > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > On Thu, Jun 12, 2014 at 1:59 PM, Mark Nelson <[email protected]> > wrote: > > Can you check and see if swap is being used on your OSD servers when this > > happens, and even better, use something like collectl or another tool to > > look for major page faults? > > > > If you see anything like this, you may want to tweak swappiness to be > lower > > (say 10). > > > > Mark > > > > > > On 06/12/2014 03:17 PM, Xu (Simon) Chen wrote: > >> > >> I've done some more tracing. It looks like the high IO wait in VMs are > >> somewhat correlated when some OSDs have high inflight ops (ceph admin > >> socket, dump_ops_in_flight). > >> > >> When in_flight_ops is high, I see something like this in the OSD log: > >> 2014-06-12 19:57:24.572338 7f4db6bdf700 1 heartbeat_map reset_timeout > >> 'OSD::op_tp thread 0x7f4db6bdf700' had timed out after 15 > >> > >> Any ideas why this happens? > >> > >> Thanks. > >> -Simon > >> > >> > >> > >> On Thu, Jun 12, 2014 at 11:14 AM, Mark Nelson <[email protected] > >> <mailto:[email protected]>> wrote: > >> > >> On 06/12/2014 08:47 AM, Xu (Simon) Chen wrote: > >> > >> 1) I did check iostat on all OSDs, and iowait seems normal. > >> 2) ceph -w shows no correlation between high io wait and high > >> iops. > >> Sometimes the reverse is true: when io wait is high (since it's > a > >> cluster wide thing), the overall ceph iops drops too. > >> > >> > >> Not sure if you are doing it yet, but you may want to look at the > >> statistics the OSDs can provide via the admin socket, especially > >> outstanding operations and dump_historic_ops. If you look at these > >> for all of your OSDs you can start getting a feel for whether any > >> specific OSDs are slow and if so, what slow ops are hanging up on. > >> > >> 3) We have collectd running in VMs, and that's how we identified > >> the > >> frequent high io wait. This happens for even lightly used VMs. > >> > >> Thanks. > >> -Simon > >> > >> > >> On Thu, Jun 12, 2014 at 9:26 AM, David <[email protected] > >> <mailto:[email protected]> > >> <mailto:[email protected] <mailto:[email protected]>>> wrote: > >> > >> Hi Simon, > >> > >> Did you check iostat on the OSDs to check their > >> utilization? What > >> does your ceph -w say - pehaps you’re maxing your cluster’s > >> IOPS? > >> Also, are you running any monitoring of your VMs iostats? > >> We’ve > >> often found some culprits overusing IOs.. > >> > >> Kind Regards, > >> David Majchrzak > >> > >> 12 jun 2014 kl. 15:22 skrev Xu (Simon) Chen > >> <[email protected] <mailto:[email protected]> > >> <mailto:[email protected] <mailto:[email protected]>>>: > >> > >> > >> > >> > Hi folks, > >> > > >> > We have two similar ceph deployments, but one of them is > >> having > >> trouble: VMs running with ceph-provided block devices are > >> seeing > >> frequent high io wait, every a few minutes, usually 15-20%, > >> but as > >> high as 60-70%. This is cluster-wide and not correlated > >> with VM's IO > >> load. We turned on rbd cache and enabled writeback in qemu, > >> but the > >> problem persists. No-deepscrub doesn't help either. > >> > > >> > Without providing any one of our probably wrong > >> theories, any > >> ideas on how to troubleshoot? > >> > > >> > Thanks. > >> > -Simon > >> > _________________________________________________ > >> > >> > ceph-users mailing list > >> > [email protected] > >> <mailto:[email protected]> > >> <mailto:[email protected].__com > >> <mailto:[email protected]>> > >> > > >> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com > >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > >> > >> > >> > >> > >> > >> _________________________________________________ > >> > >> ceph-users mailing list > >> [email protected] <mailto:[email protected]> > >> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com > >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > >> > >> > >> _________________________________________________ > >> > >> ceph-users mailing list > >> [email protected] <mailto:[email protected]> > >> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com > >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > >> > >> > > > > _______________________________________________ > > ceph-users mailing list > > [email protected] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
