Hi Serkan Coban,

We adapted the script and the solution you proposed is working fine . Thank
you for your support.

Thanks,
Muthu

On Wed, Apr 18, 2018 at 8:53 PM, Serkan Çoban <cobanser...@gmail.com> wrote:

> >68 OSDs per node sounds an order of magnitude above what you should be
> doing, unless you have vast experience with Ceph and its memory
> requirements under stress.
> I don't think so. We are also evaluating 90 OSDs per node. In order to
> know it works you need to test all the scenarios. Redhat supports max
> 72 OSD per host. So they are still in support limits.
>
> When QoS support arrives I hope we can put bandwidth limits to
> recovery, otherwise we need to do what is acceptable and works for
> now...
>
> On Wed, Apr 18, 2018 at 5:50 PM, Hans van den Bogert
> <hansbog...@gmail.com> wrote:
> > I keep seeing these threads where adding nodes has such an impact on the
> cluster as a whole, that I wonder what the rest of the cluster looks like.
> Normally I’d just advise someone to put a limit on the concurrent backfills
> that can be done, and `osd max backfills` by default already is 1. Could it
> be that the real culprit here is that the hardware is heavily overbooked?
> 68 OSDs per node sounds an order of magnitude above what you should be
> doing, unless you have vast experience with Ceph and its memory
> requirements under stress.
> > I wonder if this cluster would even come online after an outage, or
> would also crumble due to peering and possible backfilling.
> >
> > To be honest I don’t even get why using the weight option would solve
> this. The same amount of data needs to be transferred any way at one point;
> it seems like a poor-man’s throttling mechanism. And if memory shortage is
> the case here, due to, again, the many OSDs than the reweight strategy will
> only give you slightly better odds.
> >
> > So
> > 1) I would keep track of memory usage on the nodes to see if that
> increases under peering/backfilling,
> >   - If this is the case, and you’re using bluestore: try lowering
> bluestore_cache_size* params, to give you some leeway.
> > 2) If using bluestore, try throttling by changing the following params,
> depending on your environment:
> >   - osd recovery sleep
> >   - osd recovery sleep hdd
> >   - osd recovery sleep ssd
> >
> > There are other throttling params you can change, though most defaults
> are just fine in my environment, and I don’t have experience with them.
> >
> > Good luck,
> >
> > Hans
> >
> >
> >> On Apr 18, 2018, at 1:32 PM, Serkan Çoban <cobanser...@gmail.com>
> wrote:
> >>
> >> You can add new OSDs with 0 weight and edit below script to increase
> >> the osd weights instead of decreasing.
> >>
> >> https://github.com/cernceph/ceph-scripts/blob/master/
> tools/ceph-gentle-reweight
> >>
> >>
> >> On Wed, Apr 18, 2018 at 2:16 PM, nokia ceph <nokiacephus...@gmail.com>
> wrote:
> >>> Hi All,
> >>>
> >>> We are having 5 node cluster with EC 4+1 . Each node has 68 HDD . Now
> we are
> >>> trying to add new node with 68 disks to the cluster .
> >>>
> >>> We tried to add new node and created all OSDs in one go , the cluster
> >>> stopped all client traffic and does only backfilling .
> >>>
> >>> Any procedure to add the new node without affecting the client traffic
> ?
> >>>
> >>> If we create  OSDs one by one , then there is no issue in client
> traffic
> >>> however  time taken to add new node with 68 disks will be several
> months.
> >>>
> >>> Please provide your suggestions..
> >>>
> >>> Thanks,
> >>> Muthu
> >>>
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to