Hello List,

the other day when i looked at our ceph cluster it showed:

     health HEALTH_ERR 135 pgs inconsistent; 1 pgs recovering;
recovery 76/4633296 objects degraded (0.002%); 169 scrub errors; clock
skew detected on mon.mon2-nb8

I did a

 ceph pg dump  | grep -i incons | cut -f 1 | while read a; do ceph pg
repair $a & done

to get rid of most of these, but 2 remained; over night it scrubbed (i
think) and raised it to 3:

2014-06-06 03:23:53.462918 mon.0 [INF] pgmap v2623164: 10640 pgs:
10638 active+clean, 2 active+clean+inconsistent; 5657 GB data, 17068
GB used, 332 TB / 349 TB avail
2014-06-06 03:22:06.209085 osd.90 [INF] 27.58 scrub ok
2014-06-06 03:22:17.251617 osd.32 [ERR] 2.126 shard 12: soid
ec653126/rb.0.11d90.238e1f29.00000000083e/head//2 digest 1668941108 !=
known digest 3542109454
2014-06-06 03:22:17.251929 osd.32 [ERR] 2.126 deep-scrub 0 missing, 1
inconsistent objects
2014-06-06 03:22:17.251994 osd.32 [ERR] 2.126 deep-scrub 1 errors
2014-06-06 03:23:54.471206 mon.0 [INF] pgmap v2623165: 10640 pgs:
10637 active+clean, 2 active+clean+inconsistent, 1
active+clean+scrubbing; 5657 GB data, 17068 GB used, 332 TB / 349 TB

the osd hosts have the same uptime and unfortunately the logrotate
deleted the logs before that initially showed up.

I only found a post about mismatched sizes and how to fix that with
--truncate, not digests.

The host holding osd.32 is happy in its dmesg and smart looks fine to
me for this disk.

the current state of the cluster is

     health HEALTH_ERR 2 pgs inconsistent; 2 scrub errors; clock skew
detected on mon.mon1-nb8, mon.mon2-nb8

and it logs nothing in "ceph -w" when i issue

ceph pg repair 2.c1
 instructing pg 2.c1 on osd.51 to repair
ceph pg repair 2.68
 instructing pg 2.68 on osd.69 to repair

Could you help me troubleshoot that?

