Yes, that's the blueprint where we're tracking it. I've added work items to the blueprint based on the spec we've got so far.
On Mon, Nov 25, 2013 at 6:37 AM, Mike Scherbakov <[email protected]> wrote: > Dmitry, > are we tracking this effort in the blueprint > https://blueprints.launchpad.net/fuel/+spec/ceph-live-migration ? > > Can you add the work items which needs to be done there in Work Items > section, so we can track everything there? > > Thanks, > > > On Wed, Nov 20, 2013 at 1:55 AM, Andrey Korolyov <[email protected]> > wrote: >> >> On 11/20/2013 12:32 AM, Dmitry Borodaenko wrote: >> > Yes, we were able to live-migrate an instance today. After >> > re-migration back to the original node the instance began reporting >> > weird I/O errors on some commands, Ryan is re-testing to check if the >> > same problem occurs again or was a Cirros specific fluke. >> > >> > Here's our task list based on research so far: >> > 1) patch Nova to add CoW from images to instance boot drives as per >> > OSCI-773. >> > 2) patch Nova to disable shared filesystem check for live migration of >> > non-volume-backed instances (we have a hack in place, I'm working on a >> > proper patch) >> > 3) patch Nova remove 'rbd ls' from the rbd driver as per Ceph #6693 >> > found by Andrey K. >> > 4) patch Ceph manifests to create new 'compute' Ceph user, keyring, >> > and pool for Nova (we tested with the images user so far), and to use >> > the 'compute' user instead of 'volumes' when defining the libvirt >> > secret. >> > 5) figure out tls and tcp auth configuration for libvirt: we had to >> > disable it to make live migrations work, have to investigate how to >> > make them work in a more secure configuration, patch Ceph manifests >> > accordingly >> >> I suppose patch should come to the libvirt.pp, not Ceph manifests. I >> have double thoughts on topic - in one hand there is no actual reasons >> to make migration inside intranet to be wrapped by secure layer, ssh/tls >> and of course I am doing the same in Flops. In second hand, we should >> mind to not release insecure sh*t even if the rest is not ready even a >> bit for the same level of security. Moving complexity of implementation >> aside, most proper way to do this thing w/o performance impact is, of >> course, TCP+TLS transport. One should mind that there is no privilege >> seperation in the libvirt and since we`ve opened plain TCP socket for >> migrations, any user with any kind of access to the local subnet can >> gain complete control over libvirt instances and VMs under it. Generally >> we _may_ release simple TCP transport but put future patch-upgrade and >> proper TLS implementation in the MUST section of one of upcoming >> releases. TLS transport will require single-point CA like just a bunch >> of files rsynced over controllers or something more mature like [1] but >> anyway amount of work is not comparable to the resulting enhancements in >> means of current OpenStack security. We can be blamed for most stupid >> and straightforward implementation in the communities but it is a >> shortest and smartest way to put the things on board right now. >> >> [1.] http://www.ejbca.org/ >> > 6) patch Ceph manifests to modify nova.conf (enable RBD backend, >> > configure Ceph pool and user credentials, etc.) >> > 7) patch OpenStack manifests to open libvirt qemu/kvm live migration >> > ports between compute nodes, report Nova bug about live migration >> > being silently cancelled without reporting the libvirt failure to >> > connect. >> > >> > Can anyone help with item (5) above? >> > >> > >> > On Tue, Nov 19, 2013 at 2:53 AM, Mike Scherbakov >> > <[email protected]> wrote: >> >> I'd like to keep all the issues on the subject in the single email >> >> thread, >> >> so here is what I copy-pasted from A.Korolev: >> >>> http://tracker.ceph.com/issues/6693 >> >> >> >> Also, I don't see any reason for keeping this conversation private, so >> >> I'm >> >> adding fuel-dev. >> >> >> >> Dmitry - any successes so far in your research? >> >> >> >> >> >> On Tue, Nov 19, 2013 at 1:53 AM, Dmitry Borodaenko >> >> <[email protected]> wrote: >> >>> >> >>> The reason it's not a limitation for a volume backed instance is this >> >>> misguided conditional: >> >>> >> >>> >> >>> https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3922 >> >>> >> >>> It assumes that only a volume-backed instance without ephemeral disks >> >>> can be live-migrated without shared storage. I also found many other >> >>> places in live migration code in Nova making the same assumption. What >> >>> I did not find so far is any real reason for shared storage to be >> >>> required for anything other than backing the instance's boot drive, >> >>> which is no longer a concern with the Ephemeral RBD patch. I'll try to >> >>> disable this and other similar checks and see if that makes live >> >>> migration work for an instance backed by RBD. >> >>> >> >>> If that's the case and there are no other blockers in nova, libvirt or >> >>> qemu, fixing this in Nova will indeed be relatively straightforward. >> >>> >> >>> -Dmitry >> >>> >> >>> On Mon, Nov 18, 2013 at 9:37 AM, Mike Scherbakov >> >>> <[email protected]> wrote: >> >>>> If instance boots from volume, Nova should not have such a >> >>>> limitation. >> >>>> So if >> >>>> it has, it might be easier to fix Nova instead. >> >>>> >> >>>> >> >>>> On Mon, Nov 18, 2013 at 8:56 PM, Dmitry Borodaenko >> >>>> <[email protected]> wrote: >> >>>>> >> >>>>> I used patched packages built by OSCI team per Jira OSCI-773, there >> >>>>> are >> >>>>> two more patches on the branch mentioned in the thread on >> >>>>> ceph-users, I >> >>>>> still need to review and test those. >> >>>>> >> >>>>> We have seen the same error reported on this thread about shared >> >>>>> storage, >> >>>>> Nova requires /var/lib/nova to be shared between all compute nodes >> >>>>> for >> >>>>> live >> >>>>> migrations, I am still waiting for Haomai to confirm whether he was >> >>>>> able to >> >>>>> overcome this limitation. If not, we will have to add glusterfs or >> >>>>> cephfs, >> >>>>> which is too much work for 4.0 timeframe. >> >>>>> >> >>>>> On Nov 18, 2013 1:32 AM, "Mike Scherbakov" >> >>>>> <[email protected]> >> >>>>> wrote: >> >>>>>> >> >>>>>> Dmitry - sorry for late response. >> >>>>>> It is good news - I remember time when we were experimenting with >> >>>>>> DRBD, >> >>>>>> and now we will have Ceph, which should be a way better for the >> >>>>>> purposes we >> >>>>>> need it for. >> >>>>>> >> >>>>>>> works with the patched Nova packages >> >>>>>> What patches did you apply? OSCI team already aware? >> >>>>>> >> >>>>>> As we merged havana into master, what are your estimates on >> >>>>>> enabling >> >>>>>> all >> >>>>>> of this? We had meeting w/Roman, David, and we really want to have >> >>>>>> live >> >>>>>> migration enabled in 4.0 (see #6 here: >> >>>>>> >> >>>>>> >> >>>>>> https://mirantis.jira.com/wiki/display/PRD/4.0+-+Mirantis+OpenStack+release+home+page) >> >>>>>> >> >>>>>> Thanks, >> >>>>>> >> >>>>>> >> >>>>>> On Wed, Nov 13, 2013 at 12:39 AM, Dmitry Borodaenko >> >>>>>> <[email protected]> wrote: >> >>>>>>> >> >>>>>>> Ephemeral storage in Ceph works with the patched Nova packages, we >> >>>>>>> can >> >>>>>>> start updating our Ceph manifests as soon as we have havana branch >> >>>>>>> merged into fuel master! >> >>>>>>> >> >>>>>>> ---------- Forwarded message ---------- >> >>>>>>> From: Dmitry Borodaenko <[email protected]> >> >>>>>>> Date: Tue, Nov 12, 2013 at 12:38 PM >> >>>>>>> Subject: Re: Ephemeral RBD with Havana and Dumpling >> >>>>>>> To: [email protected] >> >>>>>>> >> >>>>>>> >> >>>>>>> And to answer my own question, I was missing a meaningful error >> >>>>>>> message: what the ObjectNotFound exception I got from librados >> >>>>>>> didn't >> >>>>>>> tell me was that I didn't have the images keyring file in >> >>>>>>> /etc/ceph/ >> >>>>>>> on my compute node. After 'ceph auth get-or-create client.images > >> >>>>>>> /etc/ceph/ceph.client.images.keyring' and reverting images caps >> >>>>>>> back >> >>>>>>> to original state, it all works! >> >>>>>>> >> >>>>>>> On Tue, Nov 12, 2013 at 12:19 PM, Dmitry Borodaenko >> >>>>>>> <[email protected]> wrote: >> >>>>>>>> I can get ephemeral storage for Nova to work with RBD backend, >> >>>>>>>> but >> >>>>>>>> I >> >>>>>>>> don't understand why it only works with the admin cephx user? >> >>>>>>>> With >> >>>>>>>> a >> >>>>>>>> different user starting a VM fails, even if I set its caps to >> >>>>>>>> 'allow >> >>>>>>>> *'. >> >>>>>>>> >> >>>>>>>> Here's what I have in nova.conf: >> >>>>>>>> libvirt_images_type=rbd >> >>>>>>>> libvirt_images_rbd_pool=images >> >>>>>>>> rbd_secret_uuid=fd9a11cc-6995-10d7-feb4-d338d73a4399 >> >>>>>>>> rbd_user=images >> >>>>>>>> >> >>>>>>>> The secret UUID is defined following the same steps as for Cinder >> >>>>>>>> and >> >>>>>>>> Glance: >> >>>>>>>> http://ceph.com/docs/master/rbd/libvirt/ >> >>>>>>>> >> >>>>>>>> BTW rbd_user option doesn't seem to be documented anywhere, is >> >>>>>>>> that >> >>>>>>>> a >> >>>>>>>> documentation bug? >> >>>>>>>> >> >>>>>>>> And here's what 'ceph auth list' tells me about my cephx users: >> >>>>>>>> >> >>>>>>>> client.admin >> >>>>>>>> key: AQCoSX1SmIo0AxAAnz3NffHCMZxyvpz65vgRDg== >> >>>>>>>> caps: [mds] allow >> >>>>>>>> caps: [mon] allow * >> >>>>>>>> caps: [osd] allow * >> >>>>>>>> client.images >> >>>>>>>> key: AQC1hYJS0LQhDhAAn51jxI2XhMaLDSmssKjK+g== >> >>>>>>>> caps: [mds] allow >> >>>>>>>> caps: [mon] allow * >> >>>>>>>> caps: [osd] allow * >> >>>>>>>> client.volumes >> >>>>>>>> key: AQALSn1ScKruMhAAeSETeatPLxTOVdMIt10uRg== >> >>>>>>>> caps: [mon] allow r >> >>>>>>>> caps: [osd] allow class-read object_prefix rbd_children, >> >>>>>>>> allow >> >>>>>>>> rwx pool=volumes, allow rx pool=images >> >>>>>>>> >> >>>>>>>> Setting rbd_user to images or volumes doesn't work. >> >>>>>>>> >> >>>>>>>> What am I missing? >> >>>>>>>> >> >>>>>>>> Thanks, >> >>>>>>>> >> >>>>>>>> -- >> >>>>>>>> Dmitry Borodaenko >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> Dmitry Borodaenko >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> Dmitry Borodaenko >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> -- >> >>>>>> Mike Scherbakov >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Mike Scherbakov >> >>> >> >>> >> >>> >> >>> -- >> >>> Dmitry Borodaenko >> >> >> >> >> >> >> >> >> >> -- >> >> Mike Scherbakov >> > >> > >> > >> > > > > -- > Mike Scherbakov > #mihgen -- Dmitry Borodaenko -- Mailing list: https://launchpad.net/~fuel-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~fuel-dev More help : https://help.launchpad.net/ListHelp

