Hello List, the other day when i looked at our ceph cluster it showed:
health HEALTH_ERR 135 pgs inconsistent; 1 pgs recovering; recovery 76/4633296 objects degraded (0.002%); 169 scrub errors; clock skew detected on mon.mon2-nb8 I did a ceph pg dump | grep -i incons | cut -f 1 | while read a; do ceph pg repair $a & done to get rid of most of these, but 2 remained; over night it scrubbed (i think) and raised it to 3: 2014-06-06 03:23:53.462918 mon.0 [INF] pgmap v2623164: 10640 pgs: 10638 active+clean, 2 active+clean+inconsistent; 5657 GB data, 17068 GB used, 332 TB / 349 TB avail 2014-06-06 03:22:06.209085 osd.90 [INF] 27.58 scrub ok 2014-06-06 03:22:17.251617 osd.32 [ERR] 2.126 shard 12: soid ec653126/rb.0.11d90.238e1f29.00000000083e/head//2 digest 1668941108 != known digest 3542109454 2014-06-06 03:22:17.251929 osd.32 [ERR] 2.126 deep-scrub 0 missing, 1 inconsistent objects 2014-06-06 03:22:17.251994 osd.32 [ERR] 2.126 deep-scrub 1 errors 2014-06-06 03:23:54.471206 mon.0 [INF] pgmap v2623165: 10640 pgs: 10637 active+clean, 2 active+clean+inconsistent, 1 active+clean+scrubbing; 5657 GB data, 17068 GB used, 332 TB / 349 TB avail the osd hosts have the same uptime and unfortunately the logrotate deleted the logs before that initially showed up. I only found a post about mismatched sizes and how to fix that with --truncate, not digests. The host holding osd.32 is happy in its dmesg and smart looks fine to me for this disk. the current state of the cluster is health HEALTH_ERR 2 pgs inconsistent; 2 scrub errors; clock skew detected on mon.mon1-nb8, mon.mon2-nb8 and it logs nothing in "ceph -w" when i issue ceph pg repair 2.c1 instructing pg 2.c1 on osd.51 to repair ceph pg repair 2.68 instructing pg 2.68 on osd.69 to repair Could you help me troubleshoot that? Thx Benedikt _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com