[ceph-users] 2 pgs stuck in active+clean+inconsistent

2014-06-06 Thread Benedikt Fraunhofer
Hello List,

the other day when i looked at our ceph cluster it showed:

 health HEALTH_ERR 135 pgs inconsistent; 1 pgs recovering;
recovery 76/4633296 objects degraded (0.002%); 169 scrub errors; clock
skew detected on mon.mon2-nb8

I did a

 ceph pg dump  | grep -i incons | cut -f 1 | while read a; do ceph pg
repair $a  done

to get rid of most of these, but 2 remained; over night it scrubbed (i
think) and raised it to 3:

2014-06-06 03:23:53.462918 mon.0 [INF] pgmap v2623164: 10640 pgs:
10638 active+clean, 2 active+clean+inconsistent; 5657 GB data, 17068
GB used, 332 TB / 349 TB avail
2014-06-06 03:22:06.209085 osd.90 [INF] 27.58 scrub ok
2014-06-06 03:22:17.251617 osd.32 [ERR] 2.126 shard 12: soid
ec653126/rb.0.11d90.238e1f29.083e/head//2 digest 1668941108 !=
known digest 3542109454
2014-06-06 03:22:17.251929 osd.32 [ERR] 2.126 deep-scrub 0 missing, 1
inconsistent objects
2014-06-06 03:22:17.251994 osd.32 [ERR] 2.126 deep-scrub 1 errors
2014-06-06 03:23:54.471206 mon.0 [INF] pgmap v2623165: 10640 pgs:
10637 active+clean, 2 active+clean+inconsistent, 1
active+clean+scrubbing; 5657 GB data, 17068 GB used, 332 TB / 349 TB
avail

the osd hosts have the same uptime and unfortunately the logrotate
deleted the logs before that initially showed up.

I only found a post about mismatched sizes and how to fix that with
--truncate, not digests.

The host holding osd.32 is happy in its dmesg and smart looks fine to
me for this disk.

the current state of the cluster is

 health HEALTH_ERR 2 pgs inconsistent; 2 scrub errors; clock skew
detected on mon.mon1-nb8, mon.mon2-nb8

and it logs nothing in ceph -w when i issue

ceph pg repair 2.c1
 instructing pg 2.c1 on osd.51 to repair
ceph pg repair 2.68
 instructing pg 2.68 on osd.69 to repair

Could you help me troubleshoot that?

Thx
  Benedikt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 2 pgs stuck in active+clean+inconsistent

2014-06-06 Thread Benedikt Fraunhofer
2014-06-06 9:18 GMT+02:00 Benedikt Fraunhofer
given.to.lists.ceph-users.ceph.com.toasta@traced.net:
Hello List,

 and it logs nothing in ceph -w when i issue

 ceph pg repair 2.c1
  instructing pg 2.c1 on osd.51 to repair
 ceph pg repair 2.68
  instructing pg 2.68 on osd.69 to repair

Rebooting the hosts holding those osds made them cooperative,
accepting the command and the warning go away.

I guess a restart of the osd-daemons would've been enough, i just was
too lazy to figure out how to stop one specific osd and there were
some updates pending :)

This is ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)

 Benedikt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com