Rebalancing is almost finished, but things got even worse: http://i.imgur.com/0HOPZil.png
Moreover, one pg is in active+remapped+wait_backfill+backfill_toofull state: 2015-01-05 19:39:31.995665 mon.0 [INF] pgmap v3979616: 5832 pgs: 23 active+remapped+wait_backfill, 1 active+remapped+wait_backfill+backfill_toofull, 2 active+remapped+backfilling, 5805 active+clean, 1 active+remapped+backfill_toofull; 11210 GB data, 26174 GB used, 18360 GB / 46906 GB avail; 65246/10590590 objects degraded (0.616%) So at 55.8% disk space utilization ceph is full. That doesn't look very well. On 5 January 2015 at 15:39, ivan babrou <[email protected]> wrote: > > > On 5 January 2015 at 14:20, Christian Balzer <[email protected]> wrote: > >> On Mon, 5 Jan 2015 14:04:28 +0400 ivan babrou wrote: >> >> > Hi! >> > >> > I have a cluster with 106 osds and disk usage is varying from 166gb to >> > 316gb. Disk usage is highly correlated to number of pg per osd (no >> > surprise here). Is there a reason for ceph to allocate more pg on some >> > nodes? >> > >> In essence what Wido said, you're a bit low on PGs. >> >> Also given your current utilization, pool 14 is totally oversize with 1024 >> PGs. You might want to re-create it with a smaller size and double pool 0 >> to 512 PGs and 10 to 4096. >> I assume you did raise the PGPs as well when changing the PGs, right? >> > > Yep, pg = pgp for all pools. Pool 14 is just for testing purposes, it > might get large eventually. > > I followed you advice in doubling pools 0 and 10. It is rebalancing at 30% > degraded now, but so far big osds become bigger and small become smaller: > http://i.imgur.com/hJcX9Us.png. I hope that trend would change before > rebalancing is complete. > > >> And yeah, CEPH isn't particular good at balancing stuff by itself, but >> with sufficient PGs you ought to get the variance below/around 30%. >> > > Is this going to change in the future releases? > > >> Christian >> >> > The biggest osds are 30, 42 and 69 (300gb+ each) and the smallest are >> 87, >> > 33 and 55 (170gb each). The biggest pool has 2048 pgs, pools with very >> > little data has only 8 pgs. PG size in biggest pool is ~6gb (5.1..6.3 >> > actually). >> > >> > Lack of balanced disk usage prevents me from using all the disk space. >> > When the biggest osd is full, cluster does not accept writes anymore. >> > >> > Here's gist with info about my cluster: >> > https://gist.github.com/bobrik/fb8ad1d7c38de0ff35ae >> > >> >> >> -- >> Christian Balzer Network/Systems Engineer >> [email protected] Global OnLine Japan/Fusion Communications >> http://www.gol.com/ >> > > > > -- > Regards, Ian Babrou > http://bobrik.name http://twitter.com/ibobrik skype:i.babrou > -- Regards, Ian Babrou http://bobrik.name http://twitter.com/ibobrik skype:i.babrou
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
