On Sun, Feb 11, 2018 at 8:19 PM Chris Apsey <bitskr...@bitskrieg.net> wrote:

> All,
>
> Recently doubled the number of OSDs in our cluster, and towards the end
> of the rebalancing, I noticed that recovery IO fell to nothing and that
> the ceph mons eventually looked like this when I ran ceph -s
>
>        cluster:
>          id:     6a65c3d0-b84e-4c89-bbf7-a38a1966d780
>          health: HEALTH_WARN
>                  34922/4329975 objects misplaced (0.807%)
>                  Reduced data availability: 542 pgs inactive, 49 pgs
> peering, 13502 pgs stale
>                  Degraded data redundancy: 248778/4329975 objects
> degraded (5.745%), 7319 pgs unclean, 2224 pgs degraded, 1817 pgs
> undersized
>
>        services:
>          mon: 3 daemons, quorum cephmon-0,cephmon-1,cephmon-2
>          mgr: cephmon-0(active), standbys: cephmon-1, cephmon-2
>          osd: 376 osds: 376 up, 376 in
>
>        data:
>          pools:   9 pools, 13952 pgs
>          objects: 1409k objects, 5992 GB
>          usage:   31528 GB used, 1673 TB / 1704 TB avail
>          pgs:     3.225% pgs unknown
>                   0.659% pgs not active
>                   248778/4329975 objects degraded (5.745%)
>                   34922/4329975 objects misplaced (0.807%)
>                   6141 stale+active+clean
>                   4537 stale+active+remapped+backfilling
>                   1575 stale+active+undersized+degraded
>                   489  stale+active+clean+remapped
>                   450  unknown
>                   396  stale+active+recovery_wait+degraded
>                   216
> stale+active+undersized+degraded+remapped+backfilling
>                   40   stale+peering
>                   30   stale+activating
>                   24   stale+active+undersized+remapped
>                   22   stale+active+recovering+degraded
>                   13   stale+activating+degraded
>                   9    stale+remapped+peering
>                   4    stale+active+remapped+backfill_wait
>                   3    stale+active+clean+scrubbing+deep
>                   2
> stale+active+undersized+degraded+remapped+backfill_wait
>                   1    stale+active+remapped
>
> The problem is, everything works fine.  If I run ceph health detail and
> do a pg query against one of the 'degraded' placement groups, it reports
> back as active-clean.  All clients in the cluster can write and read at
> normal speeds, but not IO information is ever reported in ceph -s.
>
>  From what I can see, everything in the cluster is working properly
> except the actual reporting on the status of the cluster.  Has anyone
> seen this before/know how to sync the mons up to what the OSDs are
> actually reporting?  I see no connectivity errors in the logs of the
> mons or the osds.
>

It sounds like the manager has gone stale somehow. You can probably fix it
by restarting, though if you have logs it would be good to file a bug
report at tracker.ceph.com.
-Greg


>
> Thanks,
>
> ---
> v/r
>
> Chris Apsey
> bitskr...@bitskrieg.net
> https://www.bitskrieg.net
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to