Hi Dan, Great story and congratulation on the successful conversion :-) There are two minor pitfalls left but they are only an inconvenience when testing the ceph-disk prepare / udev logic ( https://github.com/ceph/ceph/pull/2717 and https://github.com/ceph/ceph/pull/2648 ).
Cheers On 15/10/2014 13:20, Dan van der Ster wrote: > Hi Ceph users, > > (sorry for the novel, but perhaps this might be useful for someone) > > During our current project to upgrade our cluster from disks-only to SSD > journals, we've found it useful to convert our legacy puppet-ceph deployed > cluster (using something like the enovance module) to one that looks like it > has had its OSD created with ceph-disk prepare. It's been educational for me, > and I thought it would be good experience to share. > > To start, the "old" puppet-ceph configures OSDs explicitly in ceph.conf, like > this: > > [osd.211] > host = p05151113489275 > devs = /dev/disk/by-path/pci-0000:02:00.0-sas-...-lun-0-part1 > > and ceph-disk list says this about the disks: > > /dev/sdh : > /dev/sdh1 other, xfs, mounted on /var/lib/ceph/osd/osd.211 > > In other words, ceph-disk doesn't know anything about the OSD living on that > disk. > > Before deploying our SSD journals I was trying to find the best way to map > OSDs to SSD journal partitions (in puppet!), but basically there is no good > way to do this with the legacy puppet-ceph module. (What we'd have to do is > puppetize the partitioning of SSDs, then manually map OSDs to SSD partitions. > This would be tedious, and also error prone after disk replacements and > reboots). > > However, I've found that by using ceph-deploy, i.e ceph-disk, to prepare and > activate OSDs, this becomes very simple, trivial even. Using ceph-disk we > keep the OSD/SSD mapping out of puppet; instead the state is stored in the > OSD itself. (1.5 years ago when we deployed this cluster, ceph-deploy was > advertised as quick tool to spin up small clusters, so we didn't dare > use it. I realize now that it (or the puppet/chef/... recipes based on it) is > _the_only_way_ to build a cluster if you're starting out today.) > > Now our problem was that I couldn't go and re-ceph-deploy the whole cluster, > since we've got some precious user data there. Instead, I needed to learn how > ceph-disk is labeling and preparing disks, and modify our existing OSDs in > place to look like they'd been prepared and activated with ceph-disk. > > In the end, I've worked out all the configuration and sgdisk magic and put > the recipes into a couple of scripts here [1]. Note that I do not expect > these to work for any other cluster unmodified. In fact, that would be > dangerous, so don't blame me if you break something. But they might helpful > for understanding how the ceph-disk udev magic works and could be a basis for > upgrading other clusters. > > The scripts are: > > ceph-deployifier/ceph-create-journals.sh: > - this script partitions SSDs (assuming sda to sdd) with 5 partitions each > - the only trick is to add the partition name 'ceph journal' and set the > typecode to the magic JOURNAL_UUID along with a random partition guid > > ceph-deployifier/ceph-label-disks.sh: > - this script discovers the next OSD which is not prepared with ceph-disk, > finds an appropriate unused journal partition, and converts the OSD to a > ceph-disk prepared lookalike. > - aside from the discovery part, the main magic is to: > - create the files active, sysvinit and journal_uuid on the OSD > - rename the partition to 'ceph data', set the typecode to the magic > OSD_UUID, and the partition guid to the OSD's uuid. > - link to the /dev/disk/by-partuuid/ journal symlink, and make the new > journal > - at the end, udev is triggered and the OSD is started (via the ceph-disk > activation magic) > > The complete details are of course in the scripts. (I also have another > version of ceph-label-disks.sh that doesn't expect an SSD journal but instead > prepares the single disk 2 partitions scheme.) > > After running these scripts you'll get a nice shiny ceph-disk list output: > > /dev/sda : > /dev/sda1 ceph journal, for /dev/sde1 > /dev/sda2 ceph journal, for /dev/sdf1 > /dev/sda3 ceph journal, for /dev/sdg1 > ... > /dev/sde : > /dev/sde1 ceph data, active, cluster ceph, osd.2, journal /dev/sda1 > /dev/sdf : > /dev/sdf1 ceph data, active, cluster ceph, osd.8, journal /dev/sda2 > /dev/sdg : > /dev/sdg1 ceph data, active, cluster ceph, osd.12, journal /dev/sda3 > ... > > And all of the udev magic is working perfectly. I've tested all of the > reboot, failed OSD, and failed SSD scenarios and it all works as it should. > And the puppet-ceph manifest for osd's is now just a very simple wrapper > around ceph-disk prepare. (I haven't published ours to github yet, but it is > very similar to the stackforge puppet-ceph manifest). > > There you go, sorry that was so long. I hope someone finds this useful :) > > Best Regards, > Dan > > [1] > https://github.com/cernceph/ceph-scripts/tree/master/tools/ceph-deployifier > > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Loïc Dachary, Artisan Logiciel Libre
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
