I keep seeing these threads where adding nodes has such an impact on the 
cluster as a whole, that I wonder what the rest of the cluster looks like. 
Normally I’d just advise someone to put a limit on the concurrent backfills 
that can be done, and `osd max backfills` by default already is 1. Could it be 
that the real culprit here is that the hardware is heavily overbooked? 68 OSDs 
per node sounds an order of magnitude above what you should be doing, unless 
you have vast experience with Ceph and its memory requirements under stress. 
I wonder if this cluster would even come online after an outage, or would also 
crumble due to peering and possible backfilling.

To be honest I don’t even get why using the weight option would solve this. The 
same amount of data needs to be transferred any way at one point; it seems like 
a poor-man’s throttling mechanism. And if memory shortage is the case here, due 
to, again, the many OSDs than the reweight strategy will only give you slightly 
better odds.

So
1) I would keep track of memory usage on the nodes to see if that increases 
under peering/backfilling, 
  - If this is the case, and you’re using bluestore: try lowering 
bluestore_cache_size* params, to give you some leeway.
2) If using bluestore, try throttling by changing the following params, 
depending on your environment:
  - osd recovery sleep
  - osd recovery sleep hdd
  - osd recovery sleep ssd

There are other throttling params you can change, though most defaults are just 
fine in my environment, and I don’t have experience with them.

Good luck, 

Hans


> On Apr 18, 2018, at 1:32 PM, Serkan Çoban <[email protected]> wrote:
> 
> You can add new OSDs with 0 weight and edit below script to increase
> the osd weights instead of decreasing.
> 
> https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-gentle-reweight
> 
> 
> On Wed, Apr 18, 2018 at 2:16 PM, nokia ceph <[email protected]> wrote:
>> Hi All,
>> 
>> We are having 5 node cluster with EC 4+1 . Each node has 68 HDD . Now we are
>> trying to add new node with 68 disks to the cluster .
>> 
>> We tried to add new node and created all OSDs in one go , the cluster
>> stopped all client traffic and does only backfilling .
>> 
>> Any procedure to add the new node without affecting the client traffic ?
>> 
>> If we create  OSDs one by one , then there is no issue in client traffic
>> however  time taken to add new node with 68 disks will be several months.
>> 
>> Please provide your suggestions..
>> 
>> Thanks,
>> Muthu
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to