[ceph-users] offending shards are crashing osd's

Ronny Aasen Thu, 06 Oct 2016 04:42:40 -0700

hello

I have a few osd's in my cluster that are regularly crashing.


in the log of them i can see

osd.7

-1> 2016-10-06 08:09:18.869687 7ffaa037f700 -1 osd.7 pg_epoch:128840 pg[5.3as0( v 84797'30080 (67219'27080,84797'30080]local-les=128834 n=13146 ec=61149 les/c 128834/127358128829/128829/128829) [7,109,4,0,62,32]/[7,109,32,0,62,39] r=0lpr=128829 pi=127357-128828/12 rops=5 bft=4(2),32(5) crt=0'0 lcod 0'0mlcod 0'0 active+remapped+backfilling] handle_recovery_read_complete:inconsistent shard sizes5/abc6d43a/rbd_data.33640a238e1f29.000000000003b165/head the offendingshard must be manually removed after verifying there are enough shardsto recover (0, 8388608, [32(2),0, 39(5),0])



osd.32

-411> 2016-10-06 13:21:15.166968 7fe45b6cb700 -1 osd.32 pg_epoch:129181 pg[5.3as2( v 84797'30080 (67219'27080,84797'30080]local-les=129171 n=13146 ec=61149 les/c 129171/127358129170/129170/129170)[2147483647,2147483647,4,0,62,32]/[2147483647,2147483647,32,0,62,39] r=2lpr=129170 pi=121260-129169/43 rops=5 bft=4(2),32(5) crt=0'0 lcod 0'0mlcod 0'0 active+undersized+degraded+remapped+backfilling]handle_recovery_read_complete: inconsistent shard sizes5/abc6d43a/rbd_data.33640a238e1f29.000000000003b165/head the offendingshard must be manually removed after verifying there are enough shardsto recover (0, 8388608, [32(2),0, 39(5),0])




osd.109

-1> 2016-10-06 13:17:36.748340 7fa53d36c700 -1 osd.109 pg_epoch:129167 pg[5.3as1( v 84797'30080 (66310'24592,84797'30080]local-les=129163 n=13146 ec=61149 les/c 129163/127358129162/129162/129162)[2147483647,109,4,0,62,32]/[2147483647,109,32,0,62,39] r=1 lpr=129162pi=112552-129161/59 rops=5 bft=4(2),32(5) crt=84797'30076 lcod 0'0 mlcod0'0 active+undersized+degraded+remapped+backfilling]handle_recovery_read_complete: inconsistent shard sizes5/abc6d43a/rbd_data.33640a238e1f29.000000000003b165/head the offendingshard must be manually removed after verifying there are enough shardsto recover (0, 8388608, [32(2),0, 39(5),0])

ofcourse having 3 osd's dying regularly is not good for my health. so ihave set noout, to avoid heavy recoveries.


googeling this error messages gives exactly 1 hit:
https://github.com/ceph/ceph/pull/6946

where it saies:  "the shard must be removed so it can be reconstructed"

but with my 3 osd's failing, i am not certain witch of them contain thebroken shard. (or perhaps all 3 of them?)


a bit reluctant to delete on all 3. I have 4+2 erasure coding.

( erasure size 6 min_size 4 ) so finding out witch one is bad would benice.


hope someone have an idea how to progress.

kind regards
Ronny Aasen

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] offending shards are crashing osd's

Reply via email to