Hi Serkan Coban, We adapted the script and the solution you proposed is working fine . Thank you for your support.
Thanks, Muthu On Wed, Apr 18, 2018 at 8:53 PM, Serkan Çoban <cobanser...@gmail.com> wrote: > >68 OSDs per node sounds an order of magnitude above what you should be > doing, unless you have vast experience with Ceph and its memory > requirements under stress. > I don't think so. We are also evaluating 90 OSDs per node. In order to > know it works you need to test all the scenarios. Redhat supports max > 72 OSD per host. So they are still in support limits. > > When QoS support arrives I hope we can put bandwidth limits to > recovery, otherwise we need to do what is acceptable and works for > now... > > On Wed, Apr 18, 2018 at 5:50 PM, Hans van den Bogert > <hansbog...@gmail.com> wrote: > > I keep seeing these threads where adding nodes has such an impact on the > cluster as a whole, that I wonder what the rest of the cluster looks like. > Normally I’d just advise someone to put a limit on the concurrent backfills > that can be done, and `osd max backfills` by default already is 1. Could it > be that the real culprit here is that the hardware is heavily overbooked? > 68 OSDs per node sounds an order of magnitude above what you should be > doing, unless you have vast experience with Ceph and its memory > requirements under stress. > > I wonder if this cluster would even come online after an outage, or > would also crumble due to peering and possible backfilling. > > > > To be honest I don’t even get why using the weight option would solve > this. The same amount of data needs to be transferred any way at one point; > it seems like a poor-man’s throttling mechanism. And if memory shortage is > the case here, due to, again, the many OSDs than the reweight strategy will > only give you slightly better odds. > > > > So > > 1) I would keep track of memory usage on the nodes to see if that > increases under peering/backfilling, > > - If this is the case, and you’re using bluestore: try lowering > bluestore_cache_size* params, to give you some leeway. > > 2) If using bluestore, try throttling by changing the following params, > depending on your environment: > > - osd recovery sleep > > - osd recovery sleep hdd > > - osd recovery sleep ssd > > > > There are other throttling params you can change, though most defaults > are just fine in my environment, and I don’t have experience with them. > > > > Good luck, > > > > Hans > > > > > >> On Apr 18, 2018, at 1:32 PM, Serkan Çoban <cobanser...@gmail.com> > wrote: > >> > >> You can add new OSDs with 0 weight and edit below script to increase > >> the osd weights instead of decreasing. > >> > >> https://github.com/cernceph/ceph-scripts/blob/master/ > tools/ceph-gentle-reweight > >> > >> > >> On Wed, Apr 18, 2018 at 2:16 PM, nokia ceph <nokiacephus...@gmail.com> > wrote: > >>> Hi All, > >>> > >>> We are having 5 node cluster with EC 4+1 . Each node has 68 HDD . Now > we are > >>> trying to add new node with 68 disks to the cluster . > >>> > >>> We tried to add new node and created all OSDs in one go , the cluster > >>> stopped all client traffic and does only backfilling . > >>> > >>> Any procedure to add the new node without affecting the client traffic > ? > >>> > >>> If we create OSDs one by one , then there is no issue in client > traffic > >>> however time taken to add new node with 68 disks will be several > months. > >>> > >>> Please provide your suggestions.. > >>> > >>> Thanks, > >>> Muthu > >>> > >>> > >>> _______________________________________________ > >>> ceph-users mailing list > >>> ceph-users@lists.ceph.com > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com