[ceph-users] rbd freezes/timeout

Leon Ruumpol Thu, 09 Sep 2021 03:32:27 -0700

Hello,

We have a ceph cluster with CephFS and RBD images enabled, from Xen-NG we
connect directly to rbd images. Several times a day the VMs suffer from a
high load/iowait which makes them temporarily inaccessible (arround 10~30
seconds), in the logs on xen-ng I find this:


[Thu Sep  9 02:16:06 2021] rbd: rbd4: encountered watch error: -107
[Thu Sep  9 02:17:47 2021] rbd: rbd3: encountered watch error: -107
[Thu Sep  9 02:18:55 2021] rbd: rbd4: encountered watch error: -107
[Thu Sep  9 02:19:54 2021] rbd: rbd3: encountered watch error: -107
[Thu Sep  9 02:49:39 2021] rbd: rbd3: encountered watch error: -107
[Thu Sep  9 03:47:25 2021] rbd: rbd3: encountered watch error: -107
[Thu Sep  9 03:48:07 2021] rbd: rbd4: encountered watch error: -107
[Thu Sep  9 04:47:30 2021] rbd: rbd3: encountered watch error: -107
[Thu Sep  9 04:47:55 2021] rbd: rbd4: encountered watch error: -107

Version XEN: XCP-ng release 8.2.0 (xenenterprise) / Kernel 4.19.0+1 /
running on 4 physical nodes.

The Ceph cluster consists of 6 physical nodes, with48 osds (nvme), 3 mgr, 3
mon, 3 mds services connected with 2x10Gbps trunks from all hosts. Ceph
status/detail is OK, no iowait/high cpu/network spikes. We have looked in
the logs for a reason, but we are unable to match it with anything.
Sometimes a scrub is in progress during these watch errors, but this does
not always occur. Where is the best place to continue the search?

Ceph.conf

[global]
fsid = ******
mon_initial_members = ceph-c01-mon-n1, ceph-c01-mon-n2, ceph-c01-mon-n3
mon_host = *.*.*.170,*.*.*.171,*.*.*.172
public network = *.*.*.0/19
cluster network = *.*.*.0/19
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true

Ceph versions:

Versions:
{
    "mon": {
        "ceph version 14.2.22() nautilus(stable)": 3
    },
    "mgr": {
        "ceph version 14.2.22() nautilus(stable)": 3
    },
    "osd": {
        "ceph version 14.2.22() nautilus(stable)": 48
    },
    "mds": {
        "ceph version 14.2.22() nautilus(stable)": 3
    },
    "overall": {
        "ceph version 14.2.22() nautilus(stable)": 57
    }
}
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] rbd freezes/timeout

Reply via email to