On Sun, Feb 11, 2018 at 8:19 PM Chris Apsey <[email protected]> wrote:
> All, > > Recently doubled the number of OSDs in our cluster, and towards the end > of the rebalancing, I noticed that recovery IO fell to nothing and that > the ceph mons eventually looked like this when I ran ceph -s > > cluster: > id: 6a65c3d0-b84e-4c89-bbf7-a38a1966d780 > health: HEALTH_WARN > 34922/4329975 objects misplaced (0.807%) > Reduced data availability: 542 pgs inactive, 49 pgs > peering, 13502 pgs stale > Degraded data redundancy: 248778/4329975 objects > degraded (5.745%), 7319 pgs unclean, 2224 pgs degraded, 1817 pgs > undersized > > services: > mon: 3 daemons, quorum cephmon-0,cephmon-1,cephmon-2 > mgr: cephmon-0(active), standbys: cephmon-1, cephmon-2 > osd: 376 osds: 376 up, 376 in > > data: > pools: 9 pools, 13952 pgs > objects: 1409k objects, 5992 GB > usage: 31528 GB used, 1673 TB / 1704 TB avail > pgs: 3.225% pgs unknown > 0.659% pgs not active > 248778/4329975 objects degraded (5.745%) > 34922/4329975 objects misplaced (0.807%) > 6141 stale+active+clean > 4537 stale+active+remapped+backfilling > 1575 stale+active+undersized+degraded > 489 stale+active+clean+remapped > 450 unknown > 396 stale+active+recovery_wait+degraded > 216 > stale+active+undersized+degraded+remapped+backfilling > 40 stale+peering > 30 stale+activating > 24 stale+active+undersized+remapped > 22 stale+active+recovering+degraded > 13 stale+activating+degraded > 9 stale+remapped+peering > 4 stale+active+remapped+backfill_wait > 3 stale+active+clean+scrubbing+deep > 2 > stale+active+undersized+degraded+remapped+backfill_wait > 1 stale+active+remapped > > The problem is, everything works fine. If I run ceph health detail and > do a pg query against one of the 'degraded' placement groups, it reports > back as active-clean. All clients in the cluster can write and read at > normal speeds, but not IO information is ever reported in ceph -s. > > From what I can see, everything in the cluster is working properly > except the actual reporting on the status of the cluster. Has anyone > seen this before/know how to sync the mons up to what the OSDs are > actually reporting? I see no connectivity errors in the logs of the > mons or the osds. > It sounds like the manager has gone stale somehow. You can probably fix it by restarting, though if you have logs it would be good to file a bug report at tracker.ceph.com. -Greg > > Thanks, > > --- > v/r > > Chris Apsey > [email protected] > https://www.bitskrieg.net > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
