On Thursday, June 27, 2013, Greg Chavez wrote:
> We set up a small ceph cluster of three nodes on top of an OpenStack
> deployment of three nodes (that is, each compute node was also an
> OSD/MON node). Worked great until we started to expand the ceph
> cluster once the OSDs started to fill up. I added 4 OSDs two days ago
> and the recovery went smoothly. I added another four last night, but
> the recovery is stuck:
>
> root@kvm-sn-14i:~# ceph -s
> health HEALTH_WARN 22 pgs backfill_toofull; 19 pgs degraded; 1 pgs
> recovering; 23 pgs stuck unclean; recovery 157614/1775814 degraded
> (8.876%); recovering 2 o/s, 8864KB/s; 1 near full osd(s)
> monmap e1: 3 mons at
> {kvm-cs-sn-10i=
> 192.168.241.110:6789/0,kvm-cs-sn-14i=192.168.241.114:6789/0,kvm-cs-sn-15i=192.168.241.115:6789/0
> },
> election epoch 42, quorum 0,1,2
> kvm-cs-sn-10i,kvm-cs-sn-14i,kvm-cs-sn-15i
> osdmap e512: 30 osds: 27 up, 27 in
> pgmap v1474651: 448 pgs: 425 active+clean, 1
> active+recovering+remapped, 3 active+remapped+backfill_toofull, 11
> active+degraded+backfill_toofull, 8
> active+degraded+remapped+backfill_toofull; 3414 GB data, 6640 GB used,
> 7007 GB / 13647 GB avail; 0B/s rd, 2363B/s wr, 0op/s; 157614/1775814
> degraded (8.876%); recovering 2 o/s, 8864KB/s
> mdsmap e1: 0/0/1 up
>
> Even after restarting the OSDs, it hangs at 8.876%. Consequently,
> many of our virts have crashed.
>
> I'm hoping someone on this list can provide some suggestions.
> Otherwise, I may have to blow this up. Thanks!
"Backfill_toofull"
Right now your OSDs are trying to move data around, but one or more of your
OSDs are getting full so it's paused the data transfer.
Now, given that all the PGs are active, the clients shouldn't really be
noticing, but you might have hit an edge case we didn't account for. Do you
have any logging enabled?
Anyway, at a guess your OSDs don't all have weights proportional to thei
sizes. Check their disks and the output of "ceph osd tree" to make sure
they match, and that the tree is set up properly compared to your crush map.
-Greg
--
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com