For some reason GMail wasn't showing me any of the other responses.  I saw
what appeared to be an unanswered question.  As soon as I hit send, the
rest of the discussion showed up.  Sorry about that.


On Tue, Oct 21, 2014 at 8:25 AM, Chad Seys <[email protected]> wrote:

> Hi Craig,
>
> > It's part of the way the CRUSH hashing works.  Any change to the CRUSH
> map
> > causes the algorithm to change slightly.
>
> Dan@cern could not replicate my observations, so I plan to follow his
> procedure (fake create an OSD, wait for rebalance, remove fake OSD) in the
> near future to see if I can replicate his! :)
>

I haven't actually tried your OSD+HOST test, but I know that
ceph osd rm <osd-id>
# Wait for data movement to finish
ceph osd crush remove <osd-id>
# More data movement

The discussion about that indicated that any changes to the crushmap will
cause some data movement, but it's possible that I'm mis-remembering.



>
> > BTW, it's safer to remove OSDs and hosts by first marking the OSDs UP and
> > OUT (ceph osd out OSDID).  That will trigger the remapping, while keeping
> > the OSDs in the pool so you have all of your replicas.
>
> I am under the impression that the procedure I posted does leave the OSDs
> in
> the pool while an additional replication takes place: After "ceph osd crush
> remove osd.osdnum" I see that the used % on the removed OSD slowly
> decreases
> as the relocation of blocks takes place.
>
>
Now that I see the rest of the discussion... you can check the ceph pg
query stats during backfill.  You should see n+1 OSDs in the acting
section.



>
> BUT, I think that to keep an OSD out after using "ceph osd out OSDID" one
> needs to turn off "auto in" or something.
>

As long as it doesn't restart, it should stay OUT.



> I don't want to turn that off b/c in the past I had some slow drives which
> would occasionally be marked "out".  If they stayed "out" that could
> increase
> load on other drives, making them unresponsive, getting them marked "out"
> as
> well, leading to a domino effect where too many drives get marked "out" and
> the cluster goes down.
>

You might want to tweak mon osd down out interval, mon osd min down
reporters, and  mon osd min down reports.  I bumped up the down out
interval, because I had some OSDs that would get kicked out for being
unresponsive, and they would hang for 15 minutes until they hit the suicide
timeout and restarted.  The min down reporters and reports are less
relevant, but I had similar problems with one slow host marking all of the
other OSDs down.  I bumped up reports and reporters so that it requires
more than one host to get OSDs kicked out.



> > Ceph is also really good at handling multiple changes in a row.  For
> > example, I had to reformat all of my OSDs because I chose my mkfs.xfs
> > parameters poorly.   I removed the OSDS, without draining them first,
> which
> > caused a lot of remapping.  I then quickly formatted the OSDs, and put
> them
> > back in.  The CRUSH map went back to what it started with, and the only
> > remapping required was to re-populate the newly formatted OSDs.
>
> In this case you'd be living with num_replicas-1 for a while.  Sounds
> exciting!  :)
>
>
Yeah, not something I'd generally recommend.  I have federation setup, so
I'm able to recover any lost data from the other cluster.  I was also
pretty desperate, the XFS params I used were causing massive cluster
instability.  When I started, half of my OSDs were flapping.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to