I've had the same issue before during a cluster rebalancing and after restarting one of the daemons (can't remember now if it was one of the OSDs or MONs) the values reset to a more sane value and the cluster eventually recovered when it reached 0 objects degraded.
Additionally when you have a big number of objects to recover the ceph osd pool stats will print a negative number of objects to recover and/or a total negative number of objects. On Thu, Oct 30, 2014 at 10:14 PM, Mike Dawson <mike.daw...@cloudapt.com> wrote: > Erik, > > I reported a similar issue 22 months ago. I don't think any developer has > ever really prioritized these issues. > > http://tracker.ceph.com/issues/3720 > > I was able to recover that cluster. The method I used is in the comments. > I have no idea if my cluster was broken for the same reason as your. Your > results may vary. > > - Mike Dawson > > > > On 10/30/2014 4:50 PM, Erik Logtenberg wrote: > >> Thanks for pointing that out. Unfortunately, those tickets contain only >> a description of the problem, but no solution or workaround. One was >> opened 8 months ago and the other more than a year ago. No love since. >> >> Is there any way I can get my cluster back in a healthy state? >> >> Thanks, >> >> Erik. >> >> >> On 10/30/2014 05:13 PM, John Spray wrote: >> >>> There are a couple of open tickets about bogus (negative) stats on PGs: >>> http://tracker.ceph.com/issues/5884 >>> http://tracker.ceph.com/issues/7737 >>> >>> Cheers, >>> John >>> >>> On Thu, Oct 30, 2014 at 12:38 PM, Erik Logtenberg <e...@logtenberg.eu> >>> wrote: >>> >>>> Hi, >>>> >>>> Yesterday I removed two OSD's, to replace them with new disks. Ceph was >>>> not able to completely reach all active+clean state, but some degraded >>>> objects remain. However, the amount of degraded objects is negative >>>> (-82), see below: >>>> >>>> 2014-10-30 13:31:32.862083 mon.0 [INF] pgmap v209175: 768 pgs: 761 >>>> active+clean, 7 active+remapped; 1644 GB data, 2524 GB used, 17210 GB / >>>> 19755 GB avail; 2799 B/s wr, 1 op/s; -82/1439391 objects degraded >>>> (-0.006%) >>>> >>>> According to "rados df", the -82 degraded objects are part of the >>>> cephfs-data-cache pool, which is an SSD-backed replicated pool, that >>>> functions as a cache pool for an HDD-backed erasure coded pool for >>>> cephfs. >>>> >>>> The cache should be empty, because I isseud "rados >>>> cache-flush-evict-all"-command, and "rados -p cephfs-data-cache ls" >>>> indeed shows zero objects in this pool. >>>> >>>> "rados df" however does show 192 objects for this pool, with just 35KB >>>> used and -82 degraded: >>>> >>>> pool name category KB objects clones >>>> degraded unfound rd rd KB wr >>>> wr KB >>>> cephfs-data-cache - 35 192 0 >>>> -82 0 1119 348800 1198371 >>>> 1703673493 >>>> >>>> Please advice... >>>> >>>> Thanks, >>>> >>>> Erik. >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com