[ceph-users] Re: 2 pools - 513 pgs 100.00% pgs unknown - working cluster

Eugen Block Thu, 26 May 2022 07:10:12 -0700

First thing I would try is a mgr failover.

Zitat von Eneko Lacunza <elacu...@binovo.es>:

Hi all,
I'm trying to diagnose a issue in a tiny cluster that is showing thefollowing status:
root@proxmox3:~# ceph -s
  cluster:
    id:     80d78bb2-6be6-4dff-b41d-60d52e650016
    health: HEALTH_WARN
            1/3 mons down, quorum 0,proxmox3
            Reduced data availability: 513 pgs inactive

  services:
    mon: 3 daemons, quorum 0,proxmox3 (age 3h), out of quorum: 1
    mgr: proxmox3(active, since 16m), standbys: proxmox2
    osd: 12 osds: 8 up (since 3h), 8 in (since 3h)

  task status:

  data:
    pools:   2 pools, 513 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     100.000% pgs unknown
             513 unknown
Cluster has 3 nodes, each with 4 OSDs. One of the nodes was offlinefor 3 weeks, and when bringing it back online VMs stalled on disk I/O.
Node has been shut down again and we're trying to understand thestatus, an then will try ti diagnose issue with the troubled node.
Currently VMs are working and can read RBD volumes, but there seemsto be some kind of mgr issue (?) with stats.
There is no firewall on the nodes nor between the 3 nodes (all onthe same switch). Ping is working for both CEph public and privatenetworks.
MGR log show this continuosly:
2022-05-26T13:49:45.603+0200 7fb78ba3f700 0 auth: could not findsecret_id=18922022-05-26T13:49:45.603+0200 7fb78ba3f700 0 cephx:verify_authorizer could not get service secret for service mgrsecret_id=18922022-05-26T13:49:45.983+0200 7fb77a18d700 1 mgr.server send_reportNot sending PG status to monitor yet, waiting for OSDs2022-05-26T13:49:47.983+0200 7fb77a18d700 1 mgr.server send_reportNot sending PG status to monitor yet, waiting for OSDs2022-05-26T13:49:49.983+0200 7fb77a18d700 1 mgr.server send_reportNot sending PG status to monitor yet, waiting for OSDs2022-05-26T13:49:51.983+0200 7fb77a18d700 1 mgr.server send_reportGiving up on OSDs that haven't reported yet, sending potentiallyincomplete PG state to m
on
2022-05-26T13:49:51.983+0200 7fb77a18d700 0 log_channel(cluster)log [DBG] : pgmap v3: 513 pgs: 513 unknown; 0 B data, 0 B used, 0 B/ 0 B avail2022-05-26T13:49:53.983+0200 7fb77a18d700 0 log_channel(cluster)log [DBG] : pgmap v4: 513 pgs: 513 unknown; 0 B data, 0 B used, 0 B/ 0 B avail2022-05-26T13:49:55.983+0200 7fb77a18d700 0 log_channel(cluster)log [DBG] : pgmap v5: 513 pgs: 513 unknown; 0 B data, 0 B used, 0 B/ 0 B avail2022-05-26T13:49:57.987+0200 7fb77a18d700 0 log_channel(cluster)log [DBG] : pgmap v6: 513 pgs: 513 unknown; 0 B data, 0 B used, 0 B/ 0 B avail2022-05-26T13:49:58.403+0200 7fb78ba3f700 0 auth: could not findsecret_id=18922022-05-26T13:49:58.403+0200 7fb78ba3f700 0 cephx:verify_authorizer could not get service secret for service mgrsecret_id=1892
So it seems that mgr is unable to contact OSDs for stats, thenreports bad info to mon.
I see the following OSD ports open:
tcp 0 0 192.168.134.102:6800 0.0.0.0:* LISTEN 2268/ceph-osdtcp 0 0 192.168.133.102:6800 0.0.0.0:* LISTEN 2268/ceph-osdtcp 0 0 192.168.134.102:6801 0.0.0.0:* LISTEN 2268/ceph-osdtcp 0 0 192.168.133.102:6801 0.0.0.0:* LISTEN 2268/ceph-osdtcp 0 0 192.168.134.102:6802 0.0.0.0:* LISTEN 2268/ceph-osdtcp 0 0 192.168.133.102:6802 0.0.0.0:* LISTEN 2268/ceph-osdtcp 0 0 192.168.134.102:6803 0.0.0.0:* LISTEN 2268/ceph-osdtcp 0 0 192.168.133.102:6803 0.0.0.0:* LISTEN 2268/ceph-osdtcp 0 0 192.168.134.102:6804 0.0.0.0:* LISTEN 2271/ceph-osdtcp 0 0 192.168.133.102:6804 0.0.0.0:* LISTEN 2271/ceph-osdtcp 0 0 192.168.134.102:6805 0.0.0.0:* LISTEN 2271/ceph-osdtcp 0 0 192.168.133.102:6805 0.0.0.0:* LISTEN 2271/ceph-osdtcp 0 0 192.168.134.102:6806 0.0.0.0:* LISTEN 2271/ceph-osdtcp 0 0 192.168.133.102:6806 0.0.0.0:* LISTEN 2271/ceph-osdtcp 0 0 192.168.134.102:6807 0.0.0.0:* LISTEN 2271/ceph-osdtcp 0 0 192.168.133.102:6807 0.0.0.0:* LISTEN 2271/ceph-osdtcp 0 0 192.168.134.102:6808 0.0.0.0:* LISTEN 2267/ceph-osdtcp 0 0 192.168.133.102:6808 0.0.0.0:* LISTEN 2267/ceph-osdtcp 0 0 192.168.134.102:6809 0.0.0.0:* LISTEN 2267/ceph-osdtcp 0 0 192.168.133.102:6809 0.0.0.0:* LISTEN 2267/ceph-osdtcp 0 0 192.168.134.102:6810 0.0.0.0:* LISTEN 2267/ceph-osdtcp 0 0 192.168.133.102:6810 0.0.0.0:* LISTEN 2267/ceph-osdtcp 0 0 192.168.134.102:6811 0.0.0.0:* LISTEN 2267/ceph-osdtcp 0 0 192.168.133.102:6811 0.0.0.0:* LISTEN 2267/ceph-osdtcp 0 0 192.168.134.102:6812 0.0.0.0:* LISTEN 2274/ceph-osdtcp 0 0 192.168.133.102:6812 0.0.0.0:* LISTEN 2274/ceph-osdtcp 0 0 192.168.134.102:6813 0.0.0.0:* LISTEN 2274/ceph-osdtcp 0 0 192.168.133.102:6813 0.0.0.0:* LISTEN 2274/ceph-osdtcp 0 0 192.168.134.102:6814 0.0.0.0:* LISTEN 2274/ceph-osdtcp 0 0 192.168.133.102:6814 0.0.0.0:* LISTEN 2274/ceph-osdtcp 0 0 192.168.134.102:6815 0.0.0.0:* LISTEN 2274/ceph-osdtcp 0 0 192.168.133.102:6815 0.0.0.0:* LISTEN 2274/ceph-osd
Any idea what can I check/what's going on?

Thanks

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 |https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 2 pools - 513 pgs 100.00% pgs unknown - working cluster

Reply via email to