On Tuesday, October 21, 2014, Chad Seys <[email protected]> wrote:
> Hi Craig, > > > It's part of the way the CRUSH hashing works. Any change to the CRUSH > map > > causes the algorithm to change slightly. > > Dan@cern could not replicate my observations, so I plan to follow his > procedure (fake create an OSD, wait for rebalance, remove fake OSD) in the > near future to see if I can replicate his! :) > > > > BTW, it's safer to remove OSDs and hosts by first marking the OSDs UP and > > OUT (ceph osd out OSDID). That will trigger the remapping, while keeping > > the OSDs in the pool so you have all of your replicas. > > I am under the impression that the procedure I posted does leave the OSDs > in > the pool while an additional replication takes place: After "ceph osd crush > remove osd.osdnum" I see that the used % on the removed OSD slowly > decreases > as the relocation of blocks takes place. > > If my ceph-fu were strong enough I would try to find some block replicated > num_replicas+1 times so that my belief would be well-founded. :) > > Also "ceph osd crush remove osd.osdnum" still shows the OSD in "ceph osd > tree", but it is not attached to any server. I think it might even be > marked > UP and DOWN, but I cannot confirm. > > So I believe so far the approaches are equivalent. > > BUT, I think that to keep an OSD out after using "ceph osd out OSDID" one > needs to turn off "auto in" or something. > > I don't want to turn that off b/c in the past I had some slow drives which > would occasionally be marked "out". If they stayed "out" that could > increase > load on other drives, making them unresponsive, getting them marked "out" > as > well, leading to a domino effect where too many drives get marked "out" and > the cluster goes down. > > Now I have better hardware, but since the scenario exists, I'd rather avoid > it! :) There are separate options for automatically marking new drives in versus marking in established ones. Should be in the docs! :) -Greg > > > > If you mark the OSDs OUT, wait for the remapping to finish, and remove > the > > OSDs and host from the CRUSH map, there will still be some data > migration. > > Yep, this is what I see. But I find it weird. > > > > > > > Ceph is also really good at handling multiple changes in a row. For > > example, I had to reformat all of my OSDs because I chose my mkfs.xfs > > parameters poorly. I removed the OSDS, without draining them first, > which > > caused a lot of remapping. I then quickly formatted the OSDs, and put > them > > back in. The CRUSH map went back to what it started with, and the only > > remapping required was to re-populate the newly formatted OSDs. > > In this case you'd be living with num_replicas-1 for a while. Sounds > exciting! :) > > Thanks, > Chad. > _______________________________________________ > ceph-users mailing list > [email protected] <javascript:;> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
