Deploying or removing OSD’s in parallel for sure can save elapsed time and 
avoid moving data more than once.  There are certain pitfalls, though, and the 
strategy needs careful planning.

- Deploying a new OSD at full weight means a lot of write operations.  Running 
multiple whole-OSD backfills to a single host can — depending on your situation 
— saturate the HBA, resulting in slow requests. 
- Judicious setting of norebalance/norecover can help somewhat, to give the 
affected OSD’s/ PG’s time to peer and become ready before shoving data at them
- Deploying at 0 CRUSH weight and incrementally ratcheting up the weight as 
PG’s peer can spread that out
- I’ve recently seen the idea of temporarily setting primary-affinity to 0 on 
the affected OSD’s to deflect some competing traffic as well
- One workaround is that if you have OSD’s to deploy on more than one server, 
you could deploy them in batches of say 1-2 on each server, striping them if 
you will.  That diffuses the impact and results in faster elapsed recovery

As for how many is safe to do in parallel, there are multiple variables there.  
HDD vs SSD, client workload.  And especially how many other OSD’s are in the 
same logical rack/host.  On a cluster of 450 OSD’s, with 150 in each logical 
rack, each OSD is less than 1% of a rack, so deploying 4 of them at once would 
not be a massive change.  However in a smaller cluster with say 45 OSD’s, 15 in 
each rack, that would tickle a much larger fraction of the cluster and be more 
disruptive.

If the numbers below are TOTALS, if you would be expanding your cluster from a 
total of 4 OSD’s to a total of 8, that would be something I wouldn’t do, having 
experienced under Dumpling what it was like to triple the size of a certain 
cluster in one swoop.  

So one approach is trial and error to see how many you can get away with before 
you get slow requests, then backing off.  In production of course this is 
playing with fire. Depending on which release you’re running, cranking down a 
common set of backfill/recovery tunable can help mitigate the thundering herd 
effect as well.

— aad

> This morning I tried the careful approach, and added one OSD to server1. 
> It all went fine, everything rebuilt and I have a HEALTH_OK again now. 
> It took around 7 hours.
> 
> But now I started thinking... (and that's when things go wrong, 
> therefore hoping for feedback here....)
> 
> The question: was I being stupid to add only ONE osd to the server1? Is 
> it not smarter to add all four OSDs at the same time?
> 
> I mean: things will rebuild anyway...and I have the feeling that 
> rebuilding from 4 -> 8 OSDs is not going to be much heavier than 
> rebuilding from 4 -> 5 OSDs. Right?
> 
> So better add all new OSDs together on a specific server?
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to