Many thanks for your replies! 

Am 21.02.2018 um 02:20 schrieb Alfredo Deza:
> On Tue, Feb 20, 2018 at 5:56 PM, Oliver Freyermuth
> <freyerm...@physik.uni-bonn.de> wrote:
>> Dear Cephalopodians,
>>
>> with the release of ceph-deploy we are thinking about migrating our 
>> Bluestore-OSDs (currently created with ceph-disk via old ceph-deploy)
>> to be created via ceph-volume (with LVM).
> 
> When you say migrating, do you mean creating them again from scratch
> or making ceph-volume take over the previously created OSDs
> (ceph-volume can do both)

I would recreate from scratch to switch to LVM, we have a k=4 m=2 EC-pool with 
6 hosts, so I can just take down a full host and recreate. 
But good to know both would work! 

> 
>>
>> I note two major changes:
>> 1. It seems the block.db partitions have to be created beforehand, manually.
>>    With ceph-disk, one should not do that - or manually set the correct 
>> PARTTYPE ID.
>>    Will ceph-volume take care of setting the PARTTYPE on existing partitions 
>> for block.db now?
>>    Is it not necessary anymore?
>>    Is the config option bluestore_block_db_size now also obsoleted?
> 
> Right, ceph-volume will not create any partitions for you, so no, it
> will not take care of setting PARTTYPE either. If your setup requires
> a block.db, then this must be
> created beforehand and then passed onto ceph-volume. The one
> requirement if it is a partition is to have a PARTUUID. For logical
> volumes it can just work as-is. This is
> explained in detail at
> http://docs.ceph.com/docs/master/ceph-volume/lvm/prepare/#bluestore
> 
> PARTUUID information for ceph-volume at:
> http://docs.ceph.com/docs/master/ceph-volume/lvm/prepare/#partitioning

Ok. 
So do I understand correctly that the PARTTYPE setting (i.e. those magic 
numbers as found e.g. in ceph-disk sources in PTYPE:
https://github.com/ceph/ceph/blob/master/src/ceph-disk/ceph_disk/main.py#L62 )
is not needed anymore for the block.db partitions, since it was effectively 
only there
to have udev work?

I remember from ceph-disk that if I created the block.db partition beforehand 
and without setting the magic PARTTYPE,
it would become unhappy. 
ceph-volume and the systemd activation path should not care at all if I 
understand this correctly. 

So in short, to create a new OSD, steps for me would be:
- Create block.db partition (and don't care about PARTTYPE). 
  I do only have to make sure it has a PARTUUID. 
- ceph-volume lvm create --bluestore --block.db /dev/sdag1 --data /dev/sda
  (or the same via ceph-deploy)


>>
>> 2. Activation does not work via udev anymore, which solves some racy things.
>>
>> This second major change makes me curious: How does activation work now?
>> In the past, I could reinstall the full OS, install ceph packages, trigger 
>> udev / reboot and all OSDs would come back,
>> without storing any state / activating any services in the OS.
> 
> Activation works via systemd. This is explained in detail here
> http://docs.ceph.com/docs/master/ceph-volume/lvm/activate
> 
> Nothing with `ceph-volume lvm` requires udev for discovery. If you
> need to re-install the OS and recover your OSDs all you need to do is
> to
> re-activate them. You would need to know what the ID and UUID of the OSDs is.
> 
> If you don't have that information handy, you can run:
> 
>     ceph-volume lvm list
> 
> And all the information will be available. This will persist even on
> system re-installs

Understood - so indeed the manual step would be to run "list" and then activate 
the OSDs one-by-one
to re-create the service files. 
More cumbersome than letting udev do it's thing, but it certainly gives more 
control,
so it seems preferrable. 

Are there plans to have something like
"ceph-volume discover-and-activate" 
which would effectively do something like:
ceph-volume list and activate all OSDs which are re-discovered from LVM 
metadata? 

This would largely simplify OS reinstalls (otherwise I'll likely write a small 
shell script to do exactly that),
and as far as I understand, activating an already activated OSD should be 
harmless (it should only re-enable
an already enabled service file). 

> 
>>
>> Does this still work?
>> Or is there a manual step needed to restore the ceph-osd@ID-UUID services 
>> which at first glance appear to store state (namely ID and UUID)?
> 
> The manual step would be to call activate as described here
> http://docs.ceph.com/docs/master/ceph-volume/lvm/activate/#new-osds
>>
>> If that's the case:
>> - What is this magic manual step?
> 
> Linked above
> 
>> - Is it still possible to flip two disks within the same OSD host without 
>> issues?
> 
> What do you mean by "flip" ?

Sorry, I was unclear on this. I meant exchanging two harddrives with each other 
within a single OSD host,
e.g. /dev/sda => /dev/sdc and /dev/sdc => /dev/sda (for controller weirdness or 
whatever reason). 
If I understand correctly, this should not be a problem at all, since OSD-ID 
and PARTUUID are unaffected by that
(as you write, LVM metadata will persist with the device). 

Many thanks again for this very exensive reply! 


> 
>>   I would guess so, since the services would detect the disk in the 
>> ceph-volume trigger phase.
>> - Is it still possible to take a disk from one OSD host, and put it in 
>> another one, or does this now need a manual interaction?
>>   With ceph-disk / udev, it did not, since udev triggered disk activation 
>> and then the service was created at runtime.
> 
> It is technically possible, the lvm part of it was built with this in
> mind. The LVM metadata will persist with the device, so this is not a
> problem. Just manual activation would be needed.
> 
>>
>> Many thanks for your help and cheers,
>>         Oliver
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to