Can you check and see if swap is being used on your OSD servers when this happens, and even better, use something like collectl or another tool to look for major page faults?

If you see anything like this, you may want to tweak swappiness to be lower (say 10).

Mark

On 06/12/2014 03:17 PM, Xu (Simon) Chen wrote:
I've done some more tracing. It looks like the high IO wait in VMs are
somewhat correlated when some OSDs have high inflight ops (ceph admin
socket, dump_ops_in_flight).

When in_flight_ops is high, I see something like this in the OSD log:
2014-06-12 19:57:24.572338 7f4db6bdf700  1 heartbeat_map reset_timeout
'OSD::op_tp thread 0x7f4db6bdf700' had timed out after 15

Any ideas why this happens?

Thanks.
-Simon



On Thu, Jun 12, 2014 at 11:14 AM, Mark Nelson <[email protected]
<mailto:[email protected]>> wrote:

    On 06/12/2014 08:47 AM, Xu (Simon) Chen wrote:

        1) I did check iostat on all OSDs, and iowait seems normal.
        2) ceph -w shows no correlation between high io wait and high iops.
        Sometimes the reverse is true: when io wait is high (since it's a
        cluster wide thing), the overall ceph iops drops too.


    Not sure if you are doing it yet, but you may want to look at the
    statistics the OSDs can provide via the admin socket, especially
    outstanding operations and dump_historic_ops.  If you look at these
    for all of your OSDs you can start getting a feel for whether any
    specific OSDs are slow and if so, what slow ops are hanging up on.

        3) We have collectd running in VMs, and that's how we identified the
        frequent high io wait. This happens for even lightly used VMs.

        Thanks.
        -Simon


        On Thu, Jun 12, 2014 at 9:26 AM, David <[email protected]
        <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>> wrote:

             Hi Simon,

             Did you check iostat on the OSDs to check their
        utilization? What
             does your ceph -w say - pehaps you’re maxing your cluster’s
        IOPS?
             Also, are you running any monitoring of your VMs iostats? We’ve
             often found some culprits overusing IOs..

             Kind Regards,
             David Majchrzak

             12 jun 2014 kl. 15:22 skrev Xu (Simon) Chen
        <[email protected] <mailto:[email protected]>
             <mailto:[email protected] <mailto:[email protected]>>>:


              > Hi folks,
              >
              > We have two similar ceph deployments, but one of them is
        having
             trouble: VMs running with ceph-provided block devices are
        seeing
             frequent high io wait, every a few minutes, usually 15-20%,
        but as
             high as 60-70%. This is cluster-wide and not correlated
        with VM's IO
             load. We turned on rbd cache and enabled writeback in qemu,
        but the
             problem persists. No-deepscrub doesn't help either.
              >
              > Without providing any one of our probably wrong
        theories, any
             ideas on how to troubleshoot?
              >
              > Thanks.
              > -Simon
              > _________________________________________________
              > ceph-users mailing list
              > [email protected]
        <mailto:[email protected]>
        <mailto:[email protected].__com
        <mailto:[email protected]>>
              >
        http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
        <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>





        _________________________________________________
        ceph-users mailing list
        [email protected] <mailto:[email protected]>
        http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
        <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>


    _________________________________________________
    ceph-users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>



_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to