I've had the same issue before during a cluster rebalancing and after
restarting one of the daemons (can't remember now if it was one of the OSDs
or MONs) the values reset to a more sane value and the cluster eventually
recovered when it reached 0 objects degraded.

Additionally when you have a big number of objects to recover the ceph osd
pool stats will print a negative number of objects to recover and/or a
total negative number of objects.

On Thu, Oct 30, 2014 at 10:14 PM, Mike Dawson <mike.daw...@cloudapt.com>
wrote:

> Erik,
>
> I reported a similar issue 22 months ago. I don't think any developer has
> ever really prioritized these issues.
>
> http://tracker.ceph.com/issues/3720
>
> I was able to recover that cluster. The method I used is in the comments.
> I have no idea if my cluster was broken for the same reason as your. Your
> results may vary.
>
> - Mike Dawson
>
>
>
> On 10/30/2014 4:50 PM, Erik Logtenberg wrote:
>
>> Thanks for pointing that out. Unfortunately, those tickets contain only
>> a description of the problem, but no solution or workaround. One was
>> opened 8 months ago and the other more than a year ago. No love since.
>>
>> Is there any way I can get my cluster back in a healthy state?
>>
>> Thanks,
>>
>> Erik.
>>
>>
>> On 10/30/2014 05:13 PM, John Spray wrote:
>>
>>> There are a couple of open tickets about bogus (negative) stats on PGs:
>>> http://tracker.ceph.com/issues/5884
>>> http://tracker.ceph.com/issues/7737
>>>
>>> Cheers,
>>> John
>>>
>>> On Thu, Oct 30, 2014 at 12:38 PM, Erik Logtenberg <e...@logtenberg.eu>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Yesterday I removed two OSD's, to replace them with new disks. Ceph was
>>>> not able to completely reach all active+clean state, but some degraded
>>>> objects remain. However, the amount of degraded objects is negative
>>>> (-82), see below:
>>>>
>>>> 2014-10-30 13:31:32.862083 mon.0 [INF] pgmap v209175: 768 pgs: 761
>>>> active+clean, 7 active+remapped; 1644 GB data, 2524 GB used, 17210 GB /
>>>> 19755 GB avail; 2799 B/s wr, 1 op/s; -82/1439391 objects degraded
>>>> (-0.006%)
>>>>
>>>> According to "rados df", the -82 degraded objects are part of the
>>>> cephfs-data-cache pool, which is an SSD-backed replicated pool, that
>>>> functions as a cache pool for an HDD-backed erasure coded pool for
>>>> cephfs.
>>>>
>>>> The cache should be empty, because I isseud "rados
>>>> cache-flush-evict-all"-command, and "rados -p cephfs-data-cache ls"
>>>> indeed shows zero objects in this pool.
>>>>
>>>> "rados df" however does show 192 objects for this pool, with just 35KB
>>>> used and -82 degraded:
>>>>
>>>> pool name       category                 KB      objects       clones
>>>>    degraded      unfound           rd        rd KB           wr
>>>> wr KB
>>>> cephfs-data-cache -                         35          192            0
>>>>           -82           0         1119       348800      1198371
>>>>  1703673493
>>>>
>>>> Please advice...
>>>>
>>>> Thanks,
>>>>
>>>> Erik.
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>  _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to