On 16-01-11 04:10, Rafael Lopez wrote:
Thanks for the replies guys.
@Steve, even when you remove due to failing, have you noticed that the
cluster rebalances twice using the documented steps? You may not if
you don't wait for the initial recovery after 'ceph osd out'. If you
do 'ceph osd out' and immediately 'ceph osd crush remove', RH support
has told me that this effectively 'cancels' the original move
triggered from 'ceph osd out' and starts permanently remapping...
which still doesn't really explain why we have to do the ceph osd out
in the first place..
It needs to be tested, but I think it may not allow to do crush remove
before doing osd out (e.g. you shouldn't be removing osds from crush
which are in cluster). At least it was the case with up OSDs when I was
doing some testing
@Dan, good to hear it works, I will try that method next time and see
how it goes!
On 8 January 2016 at 03:08, Steve Taylor
<[email protected] <mailto:[email protected]>>
wrote:
If I’m not mistaken, marking an osd out will remap its placement
groups temporarily, while removing it from the crush map will
remap the placement groups permanently. Additionally, other
placement groups from other osds could get remapped permanently
when an osd is removed from the crush map. I would think the only
benefit to marking an osd out before stopping it would be a
cleaner redirection of client I/O before the osd disappears, which
may be worthwhile if you’re removing a healthy osd.
As for reweighting to 0 prior to removing an osd, it seems like
that would give the osd the ability to participate in the recovery
essentially in read-only fashion (plus deletes) until it’s empty,
so objects wouldn’t become degraded as placement groups are
backfilling onto other osds. Again, this would really only be
useful if you’re removing a healthy osd. If you’re removing an osd
where other osds in different failure domains are known to be
unhealthy, it seems like this would be a really good idea.
I usually follow the documented steps you’ve outlined myself, but
I’m typically removing osds due to failed/failing drives while the
rest of the cluster is healthy.
------------------------------------------------------------------------
*Steve Taylor*| Senior Software Engineer | StorageCraft Technology
Corporation <http://www.storagecraft.com/>
380 Data Drive Suite 300 | Draper | Utah | 84020
*Office: *801.871.2799 | *Fax: *801.545.4705
------------------------------------------------------------------------
If you are not the intended recipient of this message, be advised
that any dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender
and delete it, together with any attachments.
*From:*ceph-users [mailto:[email protected]
<mailto:[email protected]>] *On Behalf Of *Rafael
Lopez
*Sent:* Wednesday, January 06, 2016 4:53 PM
*To:* [email protected] <mailto:[email protected]>
*Subject:* [ceph-users] double rebalance when removing osd
Hi all,
I am curious what practices other people follow when removing OSDs
from a cluster. According to the docs, you are supposed to:
1. ceph osd out
2. stop daemon
3. ceph osd crush remove
4. ceph auth del
5. ceph osd rm
What value does ceph osd out (1) add to the removal process and
why is it in the docs ? We have found (as have others) that by
outing(1) and then crush removing (3), the cluster has to do two
recoveries. Is it necessary? Can you just do a crush remove
without step 1?
I found this earlier message from GregF which he seems to affirm
that just doing the crush remove is fine:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-January/007227.html
This recent blog post from Sebastien that suggests reweighting to
0 first, but havent tested it:
http://www.sebastien-han.fr/blog/2015/12/11/ceph-properly-remove-an-osd/
I thought that by marking it out, it sets the reweight to 0
anyway, so not sure how this would make a difference in terms of
two rebalances but maybe there is a subtle difference.. ?
Thanks,
Raf
--
Senior Storage Engineer - Automation and Delivery
Infrastructure Services - eSolutions
--
Senior Storage Engineer - Automation and Delivery
Infrastructure Services - eSolutions
738 Blackburn Rd, Clayton
Monash University 3800
Telephone: +61 3 9905 9118 <tel:%2B61%203%209905%9118>
Mobile: +61 4 27 682 670
Email [email protected] <mailto:[email protected]>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com