Hi Dan,

Great story and congratulation on the successful conversion :-) There are two 
minor pitfalls left but they are only an inconvenience when testing the 
ceph-disk prepare / udev logic ( https://github.com/ceph/ceph/pull/2717 and 
https://github.com/ceph/ceph/pull/2648 ). 

Cheers

On 15/10/2014 13:20, Dan van der Ster wrote:
> Hi Ceph users,
> 
> (sorry for the novel, but perhaps this might be useful for someone)
> 
> During our current project to upgrade our cluster from disks-only to SSD 
> journals, we've found it useful to convert our legacy puppet-ceph deployed 
> cluster (using something like the enovance module) to one that looks like it 
> has had its OSD created with ceph-disk prepare. It's been educational for me, 
> and I thought it would be good experience to share.
> 
> To start, the "old" puppet-ceph configures OSDs explicitly in ceph.conf, like 
> this:
> 
> [osd.211]
>    host = p05151113489275
>    devs = /dev/disk/by-path/pci-0000:02:00.0-sas-...-lun-0-part1
> 
> and ceph-disk list says this about the disks:
> 
> /dev/sdh :
>  /dev/sdh1 other, xfs, mounted on /var/lib/ceph/osd/osd.211
> 
> In other words, ceph-disk doesn't know anything about the OSD living on that 
> disk.
> 
> Before deploying our SSD journals I was trying to find the best way to map 
> OSDs to SSD journal partitions (in puppet!), but basically there is no good 
> way to do this with the legacy puppet-ceph module. (What we'd have to do is 
> puppetize the partitioning of SSDs, then manually map OSDs to SSD partitions. 
> This would be tedious, and also error prone after disk replacements and 
> reboots).
> 
> However, I've found that by using ceph-deploy, i.e ceph-disk, to prepare and 
> activate OSDs, this becomes very simple, trivial even. Using ceph-disk we 
> keep the OSD/SSD mapping out of puppet; instead the state is stored in the 
> OSD itself. (1.5 years ago when we deployed this cluster, ceph-deploy was 
> advertised as quick tool to spin up small clusters, so we didn't dare
> use it. I realize now that it (or the puppet/chef/... recipes based on it) is 
> _the_only_way_ to build a cluster if you're starting out today.)
> 
> Now our problem was that I couldn't go and re-ceph-deploy the whole cluster, 
> since we've got some precious user data there. Instead, I needed to learn how 
> ceph-disk is labeling and preparing disks, and modify our existing OSDs in 
> place to look like they'd been prepared and activated with ceph-disk.
> 
> In the end, I've worked out all the configuration and sgdisk magic and put 
> the recipes into a couple of scripts here [1]. Note that I do not expect 
> these to work for any other cluster unmodified. In fact, that would be 
> dangerous, so don't blame me if you break something. But they might helpful 
> for understanding how the ceph-disk udev magic works and could be a basis for 
> upgrading other clusters.
> 
> The scripts are:
> 
> ceph-deployifier/ceph-create-journals.sh:
>   - this script partitions SSDs (assuming sda to sdd) with 5 partitions each
>   - the only trick is to add the partition name 'ceph journal' and set the 
> typecode to the magic JOURNAL_UUID along with a random partition guid
> 
> ceph-deployifier/ceph-label-disks.sh:
>   - this script discovers the next OSD which is not prepared with ceph-disk, 
> finds an appropriate unused journal partition, and converts the OSD to a 
> ceph-disk prepared lookalike.
>   - aside from the discovery part, the main magic is to:
>     - create the files active, sysvinit and journal_uuid on the OSD
>     - rename the partition to 'ceph data', set the typecode to the magic 
> OSD_UUID, and the partition guid to the OSD's uuid.
>     - link to the /dev/disk/by-partuuid/ journal symlink, and make the new 
> journal
>   - at the end, udev is triggered and the OSD is started (via the ceph-disk 
> activation magic)
> 
> The complete details are of course in the scripts. (I also have another 
> version of ceph-label-disks.sh that doesn't expect an SSD journal but instead 
> prepares the single disk 2 partitions scheme.)
> 
> After running these scripts you'll get a nice shiny ceph-disk list output:
> 
> /dev/sda :
>  /dev/sda1 ceph journal, for /dev/sde1
>  /dev/sda2 ceph journal, for /dev/sdf1
>  /dev/sda3 ceph journal, for /dev/sdg1
> ...
> /dev/sde :
>  /dev/sde1 ceph data, active, cluster ceph, osd.2, journal /dev/sda1
> /dev/sdf :
>  /dev/sdf1 ceph data, active, cluster ceph, osd.8, journal /dev/sda2
> /dev/sdg :
>  /dev/sdg1 ceph data, active, cluster ceph, osd.12, journal /dev/sda3
> ...
> 
> And all of the udev magic is working perfectly. I've tested all of the 
> reboot, failed OSD, and failed SSD scenarios and it all works as it should. 
> And the puppet-ceph manifest for osd's is now just a very simple wrapper 
> around ceph-disk prepare. (I haven't published ours to github yet, but it is 
> very similar to the stackforge puppet-ceph manifest).
> 
> There you go, sorry that was so long. I hope someone finds this useful :)
> 
> Best Regards,
> Dan
> 
> [1] 
> https://github.com/cernceph/ceph-scripts/tree/master/tools/ceph-deployifier
> 
> 
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to