Dmitry, are we tracking this effort in the blueprint https://blueprints.launchpad.net/fuel/+spec/ceph-live-migration ?
Can you add the work items which needs to be done there in Work Items section, so we can track everything there? Thanks, On Wed, Nov 20, 2013 at 1:55 AM, Andrey Korolyov <[email protected]>wrote: > On 11/20/2013 12:32 AM, Dmitry Borodaenko wrote: > > Yes, we were able to live-migrate an instance today. After > > re-migration back to the original node the instance began reporting > > weird I/O errors on some commands, Ryan is re-testing to check if the > > same problem occurs again or was a Cirros specific fluke. > > > > Here's our task list based on research so far: > > 1) patch Nova to add CoW from images to instance boot drives as per > OSCI-773. > > 2) patch Nova to disable shared filesystem check for live migration of > > non-volume-backed instances (we have a hack in place, I'm working on a > > proper patch) > > 3) patch Nova remove 'rbd ls' from the rbd driver as per Ceph #6693 > > found by Andrey K. > > 4) patch Ceph manifests to create new 'compute' Ceph user, keyring, > > and pool for Nova (we tested with the images user so far), and to use > > the 'compute' user instead of 'volumes' when defining the libvirt > > secret. > > 5) figure out tls and tcp auth configuration for libvirt: we had to > > disable it to make live migrations work, have to investigate how to > > make them work in a more secure configuration, patch Ceph manifests > > accordingly > > I suppose patch should come to the libvirt.pp, not Ceph manifests. I > have double thoughts on topic - in one hand there is no actual reasons > to make migration inside intranet to be wrapped by secure layer, ssh/tls > and of course I am doing the same in Flops. In second hand, we should > mind to not release insecure sh*t even if the rest is not ready even a > bit for the same level of security. Moving complexity of implementation > aside, most proper way to do this thing w/o performance impact is, of > course, TCP+TLS transport. One should mind that there is no privilege > seperation in the libvirt and since we`ve opened plain TCP socket for > migrations, any user with any kind of access to the local subnet can > gain complete control over libvirt instances and VMs under it. Generally > we _may_ release simple TCP transport but put future patch-upgrade and > proper TLS implementation in the MUST section of one of upcoming > releases. TLS transport will require single-point CA like just a bunch > of files rsynced over controllers or something more mature like [1] but > anyway amount of work is not comparable to the resulting enhancements in > means of current OpenStack security. We can be blamed for most stupid > and straightforward implementation in the communities but it is a > shortest and smartest way to put the things on board right now. > > [1.] http://www.ejbca.org/ > > 6) patch Ceph manifests to modify nova.conf (enable RBD backend, > > configure Ceph pool and user credentials, etc.) > > 7) patch OpenStack manifests to open libvirt qemu/kvm live migration > > ports between compute nodes, report Nova bug about live migration > > being silently cancelled without reporting the libvirt failure to > > connect. > > > > Can anyone help with item (5) above? > > > > > > On Tue, Nov 19, 2013 at 2:53 AM, Mike Scherbakov > > <[email protected]> wrote: > >> I'd like to keep all the issues on the subject in the single email > thread, > >> so here is what I copy-pasted from A.Korolev: > >>> http://tracker.ceph.com/issues/6693 > >> > >> Also, I don't see any reason for keeping this conversation private, so > I'm > >> adding fuel-dev. > >> > >> Dmitry - any successes so far in your research? > >> > >> > >> On Tue, Nov 19, 2013 at 1:53 AM, Dmitry Borodaenko > >> <[email protected]> wrote: > >>> > >>> The reason it's not a limitation for a volume backed instance is this > >>> misguided conditional: > >>> > >>> > https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3922 > >>> > >>> It assumes that only a volume-backed instance without ephemeral disks > >>> can be live-migrated without shared storage. I also found many other > >>> places in live migration code in Nova making the same assumption. What > >>> I did not find so far is any real reason for shared storage to be > >>> required for anything other than backing the instance's boot drive, > >>> which is no longer a concern with the Ephemeral RBD patch. I'll try to > >>> disable this and other similar checks and see if that makes live > >>> migration work for an instance backed by RBD. > >>> > >>> If that's the case and there are no other blockers in nova, libvirt or > >>> qemu, fixing this in Nova will indeed be relatively straightforward. > >>> > >>> -Dmitry > >>> > >>> On Mon, Nov 18, 2013 at 9:37 AM, Mike Scherbakov > >>> <[email protected]> wrote: > >>>> If instance boots from volume, Nova should not have such a limitation. > >>>> So if > >>>> it has, it might be easier to fix Nova instead. > >>>> > >>>> > >>>> On Mon, Nov 18, 2013 at 8:56 PM, Dmitry Borodaenko > >>>> <[email protected]> wrote: > >>>>> > >>>>> I used patched packages built by OSCI team per Jira OSCI-773, there > are > >>>>> two more patches on the branch mentioned in the thread on > ceph-users, I > >>>>> still need to review and test those. > >>>>> > >>>>> We have seen the same error reported on this thread about shared > >>>>> storage, > >>>>> Nova requires /var/lib/nova to be shared between all compute nodes > for > >>>>> live > >>>>> migrations, I am still waiting for Haomai to confirm whether he was > >>>>> able to > >>>>> overcome this limitation. If not, we will have to add glusterfs or > >>>>> cephfs, > >>>>> which is too much work for 4.0 timeframe. > >>>>> > >>>>> On Nov 18, 2013 1:32 AM, "Mike Scherbakov" <[email protected] > > > >>>>> wrote: > >>>>>> > >>>>>> Dmitry - sorry for late response. > >>>>>> It is good news - I remember time when we were experimenting with > >>>>>> DRBD, > >>>>>> and now we will have Ceph, which should be a way better for the > >>>>>> purposes we > >>>>>> need it for. > >>>>>> > >>>>>>> works with the patched Nova packages > >>>>>> What patches did you apply? OSCI team already aware? > >>>>>> > >>>>>> As we merged havana into master, what are your estimates on enabling > >>>>>> all > >>>>>> of this? We had meeting w/Roman, David, and we really want to have > >>>>>> live > >>>>>> migration enabled in 4.0 (see #6 here: > >>>>>> > >>>>>> > https://mirantis.jira.com/wiki/display/PRD/4.0+-+Mirantis+OpenStack+release+home+page > ) > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> > >>>>>> On Wed, Nov 13, 2013 at 12:39 AM, Dmitry Borodaenko > >>>>>> <[email protected]> wrote: > >>>>>>> > >>>>>>> Ephemeral storage in Ceph works with the patched Nova packages, we > >>>>>>> can > >>>>>>> start updating our Ceph manifests as soon as we have havana branch > >>>>>>> merged into fuel master! > >>>>>>> > >>>>>>> ---------- Forwarded message ---------- > >>>>>>> From: Dmitry Borodaenko <[email protected]> > >>>>>>> Date: Tue, Nov 12, 2013 at 12:38 PM > >>>>>>> Subject: Re: Ephemeral RBD with Havana and Dumpling > >>>>>>> To: [email protected] > >>>>>>> > >>>>>>> > >>>>>>> And to answer my own question, I was missing a meaningful error > >>>>>>> message: what the ObjectNotFound exception I got from librados > didn't > >>>>>>> tell me was that I didn't have the images keyring file in > /etc/ceph/ > >>>>>>> on my compute node. After 'ceph auth get-or-create client.images > > >>>>>>> /etc/ceph/ceph.client.images.keyring' and reverting images caps > back > >>>>>>> to original state, it all works! > >>>>>>> > >>>>>>> On Tue, Nov 12, 2013 at 12:19 PM, Dmitry Borodaenko > >>>>>>> <[email protected]> wrote: > >>>>>>>> I can get ephemeral storage for Nova to work with RBD backend, but > >>>>>>>> I > >>>>>>>> don't understand why it only works with the admin cephx user? With > >>>>>>>> a > >>>>>>>> different user starting a VM fails, even if I set its caps to > >>>>>>>> 'allow > >>>>>>>> *'. > >>>>>>>> > >>>>>>>> Here's what I have in nova.conf: > >>>>>>>> libvirt_images_type=rbd > >>>>>>>> libvirt_images_rbd_pool=images > >>>>>>>> rbd_secret_uuid=fd9a11cc-6995-10d7-feb4-d338d73a4399 > >>>>>>>> rbd_user=images > >>>>>>>> > >>>>>>>> The secret UUID is defined following the same steps as for Cinder > >>>>>>>> and > >>>>>>>> Glance: > >>>>>>>> http://ceph.com/docs/master/rbd/libvirt/ > >>>>>>>> > >>>>>>>> BTW rbd_user option doesn't seem to be documented anywhere, is > that > >>>>>>>> a > >>>>>>>> documentation bug? > >>>>>>>> > >>>>>>>> And here's what 'ceph auth list' tells me about my cephx users: > >>>>>>>> > >>>>>>>> client.admin > >>>>>>>> key: AQCoSX1SmIo0AxAAnz3NffHCMZxyvpz65vgRDg== > >>>>>>>> caps: [mds] allow > >>>>>>>> caps: [mon] allow * > >>>>>>>> caps: [osd] allow * > >>>>>>>> client.images > >>>>>>>> key: AQC1hYJS0LQhDhAAn51jxI2XhMaLDSmssKjK+g== > >>>>>>>> caps: [mds] allow > >>>>>>>> caps: [mon] allow * > >>>>>>>> caps: [osd] allow * > >>>>>>>> client.volumes > >>>>>>>> key: AQALSn1ScKruMhAAeSETeatPLxTOVdMIt10uRg== > >>>>>>>> caps: [mon] allow r > >>>>>>>> caps: [osd] allow class-read object_prefix rbd_children, > >>>>>>>> allow > >>>>>>>> rwx pool=volumes, allow rx pool=images > >>>>>>>> > >>>>>>>> Setting rbd_user to images or volumes doesn't work. > >>>>>>>> > >>>>>>>> What am I missing? > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Dmitry Borodaenko > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Dmitry Borodaenko > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Dmitry Borodaenko > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Mike Scherbakov > >>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Mike Scherbakov > >>> > >>> > >>> > >>> -- > >>> Dmitry Borodaenko > >> > >> > >> > >> > >> -- > >> Mike Scherbakov > > > > > > > > -- Mike Scherbakov #mihgen
-- Mailing list: https://launchpad.net/~fuel-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~fuel-dev More help : https://help.launchpad.net/ListHelp

