Hi,
I've given this a try a few months ago, what I found out that there is a
difference between a storage pool and a disk declaration in libvirt.
I'll take the LVM storage pool as an example:
In src/storage you will find storage_backend_logical.c|h, these are
simple "wrappers" around the LVM commands like lvcreate, lvremove, etc,
etc.
static int
virStorageBackendLogicalDeleteVol(virConnectPtr conn ATTRIBUTE_UNUSED,
virStoragePoolObjPtr pool
ATTRIBUTE_UNUSED,
virStorageVolDefPtr vol,
unsigned int flags ATTRIBUTE_UNUSED)
{
const char *cmdargv[] = {
LVREMOVE, "-f", vol->target.path, NULL
};
if (virRun(cmdargv, NULL) < 0)
return -1;
return 0;
}
virStorageBackend virStorageBackendLogical = {
.type = VIR_STORAGE_POOL_LOGICAL,
....
....
....
.deleteVol = virStorageBackendLogicalDeleteVol,
....
};
As you can see, libvirt simply calls "lvremove" to remove the command,
but this does not help you mapping the LV to a virtual machine, it's
just a mechanism to manage your storage via libvirt, as you can do with
Virt-Manager (which uses libvirt)
Below you find two screenshots how this works in Virt Manager, as you
can see, you can manage your VG's and attach LV's to a Virtual Machine.
* http://zooi.widodh.nl/ceph/qemu-kvm/screenshots/storage_allocation.png
*
http://zooi.widodh.nl/ceph/qemu-kvm/screenshots/storage_manager_virt.png
Note, this is Virt Manager and not libvirt, but it uses libvirt you
perform these actions.
On the CLI you have for example: vol-create, vol-delete, pool-create,
pool-delete
But, there is no special disk format for a LV, in my XML there is:
<disk type='block' device='disk'>
<source dev='/dev/xen-domains/v3-root'/>
<target dev='sda' bus='scsi'/>
</disk>
So libvirt somehow reads "source dev" and maps this back to a VG and LV.
A storage manager for RBD would simply mean implementing wrap functions
around the "rbd" binary and parsing output from it.
Implementing RBD support in libvirt would then mean two things:
1. Storage manager in libvirt
2. A special disk format for RBD
The first one is done as I explained above, but for the second one, I'm
not sure how you could do that.
Libvirt now expects a disk to always be a file/block, the virtual disks
like RBD and NBD are not supported.
For #2 we should have a "special" disk declaration format, like
mentioned on the RedHat mailinglist:
http://www.redhat.com/archives/libvir-list/2010-June/msg00300.html
<disk type='rbd' device='disk'>
<driver name='qemu' type='raw' />
<source pool='rbd' image='alpha' />
<target dev='vda' bus='virtio' />
</disk>
As images on a RBD image are always "raw", it might seem obsolete to
define this, but newer version of Qemu don't autodetect formats.
Defining a monitor in the disk declaration won't be possible I think, I
don't see a way to get that parameter down to librados, so we need a
valid /etc/ceph/ceph.conf
Now, I'm not a libvirt expert, this is what I found in my search.
Any suggestions / thoughts about this?
Thanks,
Wido
On Mon, 2010-11-01 at 20:52 -0700, Sage Weil wrote:
> Hi,
>
> We've been working on RBD, a distributed block device backed by the Ceph
> distributed object store. (Ceph is a highly scalable, fault tolerant
> distributed storage and file system; see http://ceph.newdream.net.)
> Although the Ceph file system client has been in Linux since 2.6.34, the
> RBD block device was just merged for 2.6.37. We also have patches pending
> for Qemu that use librados to natively talk to the Ceph storage backend,
> avoiding any kernel dependency.
>
> To support disks backed by RBD in libvirt, we originally proposed a
> 'virtual' type that simply passed the configuration information through to
> qemu, but that idea was shot down for a variety of reasons:
>
> http://www.redhat.com/archives/libvir-list/2010-June/thread.html#00257
>
> It sounds like the "right" approach is to create a storage pool type.
> Ceph also has a 'pool' concept that contains some number of RBD images and
> a command line tool to manipulate (create, destroy, resize, rename,
> snapshot, etc.) those images, which seems to map nicely onto the storage
> pool abstraction. For example,
>
> $ rbd create foo -s 1000
> rbd image 'foo':
> size 1000 MB in 250 objects
> order 22 (4096 KB objects)
> adding rbd image to directory...
> creating rbd image...
> done.
> $ rbd create bar -s 10000
> [...]
> $ rbd list
> bar
> foo
>
> Something along the lines of
>
> <pool type="rbd">
> <name>virtimages</name>
> <source mode="kernel">
> <host monitor="ceph-mon1.domain.com:6789"/>
> <host monitor="ceph-mon2.domain.com:6789"/>
> <host monitor="ceph-mon3.domain.com:6789"/>
> <pool name="rbd"/>
> </source>
> </pool>
>
> or whatever (I'm not too familiar with the libvirt schema)? One
> difference between the existing pool types listed at
> libvirt.org/storage.html is that RBD does not necessarily associate itself
> with a path in the local file system. If the native qemu driver is used,
> there is no path involved, just a magic string passed to qemu
> (rbd:poolname/imagename). If the kernel RBD driver is used, it gets
> mapped to a /dev/rbd/$n (or similar, depending on the udev rule), but $n
> is not static across reboots.
>
> In any case, before someone goes off and implements something, does this
> look like the right general approach to adding rbd support to libvirt?
>
> Thanks!
> sage
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html