Hello List,

the other day when i looked at our ceph cluster it showed:

     health HEALTH_ERR 135 pgs inconsistent; 1 pgs recovering;
recovery 76/4633296 objects degraded (0.002%); 169 scrub errors; clock
skew detected on mon.mon2-nb8

I did a

 ceph pg dump  | grep -i incons | cut -f 1 | while read a; do ceph pg
repair $a & done

to get rid of most of these, but 2 remained; over night it scrubbed (i
think) and raised it to 3:

2014-06-06 03:23:53.462918 mon.0 [INF] pgmap v2623164: 10640 pgs:
10638 active+clean, 2 active+clean+inconsistent; 5657 GB data, 17068
GB used, 332 TB / 349 TB avail
2014-06-06 03:22:06.209085 osd.90 [INF] 27.58 scrub ok
2014-06-06 03:22:17.251617 osd.32 [ERR] 2.126 shard 12: soid
ec653126/rb.0.11d90.238e1f29.00000000083e/head//2 digest 1668941108 !=
known digest 3542109454
2014-06-06 03:22:17.251929 osd.32 [ERR] 2.126 deep-scrub 0 missing, 1
inconsistent objects
2014-06-06 03:22:17.251994 osd.32 [ERR] 2.126 deep-scrub 1 errors
2014-06-06 03:23:54.471206 mon.0 [INF] pgmap v2623165: 10640 pgs:
10637 active+clean, 2 active+clean+inconsistent, 1
active+clean+scrubbing; 5657 GB data, 17068 GB used, 332 TB / 349 TB
avail

the osd hosts have the same uptime and unfortunately the logrotate
deleted the logs before that initially showed up.

I only found a post about mismatched sizes and how to fix that with
--truncate, not digests.

The host holding osd.32 is happy in its dmesg and smart looks fine to
me for this disk.

the current state of the cluster is

     health HEALTH_ERR 2 pgs inconsistent; 2 scrub errors; clock skew
detected on mon.mon1-nb8, mon.mon2-nb8

and it logs nothing in "ceph -w" when i issue

ceph pg repair 2.c1
 instructing pg 2.c1 on osd.51 to repair
ceph pg repair 2.68
 instructing pg 2.68 on osd.69 to repair

Could you help me troubleshoot that?

Thx
  Benedikt
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to