For example, Punith from CloudByte sent out an e-mail yesterday that was
very similar to this thread, but he was wondering how to implement such a
concept on his company's SAN technology.

On Mon, Feb 16, 2015 at 10:40 AM, Mike Tutkowski <
mike.tutkow...@solidfire.com> wrote:

> Yeah, I think it's a similar concept, though.
>
> You would want to take snapshots on Ceph (or some other backend system
> that acts as primary storage) instead of copying data to secondary storage
> and calling it a snapshot.
>
> For Ceph or any other backend system like that, the idea is to speed up
> snapshots by not requiring CPU cycles on the front end or network bandwidth
> to transfer the data.
>
> In that sense, this is a general-purpose CloudStack problem and it appears
> you are intending on discussing only the Ceph implementation here, which is
> fine.
>
> On Mon, Feb 16, 2015 at 10:34 AM, Logan Barfield <lbarfi...@tqhosting.com>
> wrote:
>
>> Hi Mike,
>>
>> I think the interest in this issue is primarily for Ceph RBD, which
>> doesn't use iSCSI or SAN concepts in general.  As well I believe RBD
>> is only currently supported in KVM (and VMware?).  QEMU has native RBD
>> support, so it attaches the devices directly to the VMs in question.
>> It also natively supports snapshotting, which is what this discussion
>> is about.
>>
>> Thank You,
>>
>> Logan Barfield
>> Tranquil Hosting
>>
>>
>> On Mon, Feb 16, 2015 at 11:46 AM, Mike Tutkowski
>> <mike.tutkow...@solidfire.com> wrote:
>> > I should have also commented on KVM (since that was the hypervisor
>> called
>> > out in the initial e-mail).
>> >
>> > In my situation, most of my customers use XenServer and/or ESXi, so KVM
>> has
>> > received the fewest of my cycles with regards to those three
>> hypervisors.
>> >
>> > KVM, though, is actually the simplest hypervisor for which to implement
>> > these changes (since I am using the iSCSI adapter of the KVM agent and
>> it
>> > just essentially passes my LUN to the VM in question).
>> >
>> > For KVM, there is no clustered file system applied to my backend LUN,
>> so I
>> > don't have to "worry" about that layer.
>> >
>> > I don't see any hurdles like *immutable* UUIDs of SRs and VDIs (such is
>> the
>> > case with XenServer) or having to re-signature anything (such is the
>> case
>> > with ESXi).
>> >
>> > On Mon, Feb 16, 2015 at 9:33 AM, Mike Tutkowski <
>> > mike.tutkow...@solidfire.com> wrote:
>> >
>> >> I have been working on this on and off for a while now (as time
>> permits).
>> >>
>> >> Here is an e-mail I sent to a customer of ours that helps describe
>> some of
>> >> the issues:
>> >>
>> >> *** Beginning of e-mail ***
>> >>
>> >> The main requests were around the following features:
>> >>
>> >> * The ability to leverage SolidFire snapshots.
>> >>
>> >> * The ability to create CloudStack templates from SolidFire snapshots.
>> >>
>> >> I had these on my roadmap, but bumped the priority up and began work on
>> >> them for the CS 4.6 release.
>> >>
>> >> During design, I realized there were issues with the way XenServer is
>> >> architected that prevented me from directly using SolidFire snapshots.
>> >>
>> >> I could definitely take a SolidFire snapshot of a SolidFire volume, but
>> >> this snapshot would not be usable from XenServer if the original
>> volume was
>> >> still in use.
>> >>
>> >> Here is the gist of the problem:
>> >>
>> >> When XenServer leverages an iSCSI target such as a SolidFire volume, it
>> >> applies a clustered files system to it, which they call a storage
>> >> repository (SR). An SR has an *immutable* UUID associated with it.
>> >>
>> >> The virtual volume (which a VM sees as a disk) is represented by a
>> virtual
>> >> disk image (VDI) in the SR. A VDI also has an *immutable* UUID
>> associated
>> >> with it.
>> >>
>> >> If I take a snapshot (or a clone) of the SolidFire volume and then
>> later
>> >> try to use that snapshot from XenServer, XenServer complains that the
>> SR on
>> >> the snapshot has a UUID that conflicts with an existing UUID.
>> >>
>> >> In other words, it is not possible to use the original SR and the
>> snapshot
>> >> of this SR from XenServer at the same time, which is critical in a
>> cloud
>> >> environment (to enable creating templates from snapshots).
>> >>
>> >> The way I have proposed circumventing this issue is not ideal, but
>> >> technically works (this code is checked into the CS 4.6 branch):
>> >>
>> >> When the time comes to take a CloudStack snapshot of a CloudStack
>> volume
>> >> that is backed by SolidFire storage via the storage plug-in, the
>> plug-in
>> >> will create a new SolidFire volume with characteristics (size and IOPS)
>> >> equal to those of the original volume.
>> >>
>> >> We then have XenServer attach to this new SolidFire volume, create a
>> *new*
>> >> SR on it, and then copy the VDI from the source SR to the destination
>> SR
>> >> (the new SR).
>> >>
>> >> This leads to us having a copy of the VDI (a "snapshot" of sorts), but
>> it
>> >> requires CPU cycles on the compute cluster as well as network
>> bandwidth to
>> >> write to the SAN (thus it is slower and more resource intensive than a
>> >> SolidFire snapshot).
>> >>
>> >> I spoke with Tim Mackey (who works on XenServer at Citrix) concerning
>> this
>> >> issue before and during the CloudStack Collaboration Conference in
>> Budapest
>> >> in November. He agreed that this is a legitimate issue with the way
>> >> XenServer is designed and could not think of a way (other than what I
>> was
>> >> doing) to get around it in current versions of XenServer.
>> >>
>> >> One thought is to have a feature added to XenServer that enables you to
>> >> change the UUID of an SR and of a VDI.
>> >>
>> >> If I could do that, then I could take a SolidFire snapshot of the
>> >> SolidFire volume and issue commands to XenServer to have it change the
>> >> UUIDs of the original SR and the original VDI. I could then recored the
>> >> necessary UUID info in the CS DB.
>> >>
>> >> *** End of e-mail ***
>> >>
>> >> I have since investigated this on ESXi.
>> >>
>> >> ESXi does have a way for us to "re-signature" a datastore, so backend
>> >> snapshots can be taken and effectively used on this hypervisor.
>> >>
>> >> On Mon, Feb 16, 2015 at 8:19 AM, Logan Barfield <
>> lbarfi...@tqhosting.com>
>> >> wrote:
>> >>
>> >>> I'm just going to stick with the qemu-img option change for RBD for
>> >>> now (which should cut snapshot time down drastically), and look
>> >>> forward to this in the future.  I'd be happy to help get this moving,
>> >>> but I'm not enough of a developer to lead the charge.
>> >>>
>> >>> As far as renaming goes, I agree that maybe backups isn't the right
>> >>> word.  That being said calling a full-sized copy of a volume a
>> >>> "snapshot" also isn't the right word.  Maybe "image" would be better?
>> >>>
>> >>> I've also got my reservations about "accounts" vs "users" (I think
>> >>> "departments" and "accounts or users" respectively is less confusing),
>> >>> but that's a different thread.
>> >>>
>> >>> Thank You,
>> >>>
>> >>> Logan Barfield
>> >>> Tranquil Hosting
>> >>>
>> >>>
>> >>> On Mon, Feb 16, 2015 at 10:04 AM, Wido den Hollander <w...@widodh.nl>
>> >>> wrote:
>> >>> >
>> >>> >
>> >>> > On 16-02-15 15:38, Logan Barfield wrote:
>> >>> >> I like this idea a lot for Ceph RBD.  I do think there should
>> still be
>> >>> >> support for copying snapshots to secondary storage as needed (for
>> >>> >> transfers between zones, etc.).  I really think that this could be
>> >>> >> part of a larger move to clarify the naming conventions used for
>> disk
>> >>> >> operations.  Currently "Volume Snapshots" should probably really be
>> >>> >> called "Backups".  So having "snapshot" functionality, and a
>> "convert
>> >>> >> snapshot to backup/template" would be a good move.
>> >>> >>
>> >>> >
>> >>> > I fully agree that this would be a very great addition.
>> >>> >
>> >>> > I won't be able to work on this any time soon though.
>> >>> >
>> >>> > Wido
>> >>> >
>> >>> >> Thank You,
>> >>> >>
>> >>> >> Logan Barfield
>> >>> >> Tranquil Hosting
>> >>> >>
>> >>> >>
>> >>> >> On Mon, Feb 16, 2015 at 9:16 AM, Andrija Panic <
>> >>> andrija.pa...@gmail.com> wrote:
>> >>> >>> BIG +1
>> >>> >>>
>> >>> >>> My team should submit some patch to ACS for better KVM snapshots,
>> >>> including
>> >>> >>> whole VM snapshot etc...but it's too early to give details...
>> >>> >>> best
>> >>> >>>
>> >>> >>> On 16 February 2015 at 13:01, Andrei Mikhailovsky <
>> and...@arhont.com>
>> >>> wrote:
>> >>> >>>
>> >>> >>>> Hello guys,
>> >>> >>>>
>> >>> >>>> I was hoping to have some feedback from the community on the
>> subject
>> >>> of
>> >>> >>>> having an ability to keep snapshots on the primary storage where
>> it
>> >>> is
>> >>> >>>> supported by the storage backend.
>> >>> >>>>
>> >>> >>>> The idea behind this functionality is to improve how snapshots
>> are
>> >>> >>>> currently handled on KVM hypervisors with Ceph primary storage.
>> At
>> >>> the
>> >>> >>>> moment, the snapshots are taken on the primary storage and being
>> >>> copied to
>> >>> >>>> the secondary storage. This method is very slow and inefficient
>> even
>> >>> on
>> >>> >>>> small infrastructure. Even on medium deployments using snapshots
>> in
>> >>> KVM
>> >>> >>>> becomes nearly impossible. If you have tens or hundreds
>> concurrent
>> >>> >>>> snapshots taking place you will have a bunch of timeouts and
>> errors,
>> >>> your
>> >>> >>>> network becomes clogged, etc. In addition, using these snapshots
>> for
>> >>> >>>> creating new volumes or reverting back vms also slow and
>> >>> inefficient. As
>> >>> >>>> above, when you have tens or hundreds concurrent operations it
>> will
>> >>> not
>> >>> >>>> succeed and you will have a majority of tasks with errors or
>> >>> timeouts.
>> >>> >>>>
>> >>> >>>> At the moment, taking a single snapshot of relatively small
>> volumes
>> >>> (200GB
>> >>> >>>> or 500GB for instance) takes tens if not hundreds of minutes.
>> Taking
>> >>> a
>> >>> >>>> snapshot of the same volume on ceph primary storage takes a few
>> >>> seconds at
>> >>> >>>> most! Similarly, converting a snapshot to a volume takes tens if
>> not
>> >>> >>>> hundreds of minutes when secondary storage is involved; compared
>> with
>> >>> >>>> seconds if done directly on the primary storage.
>> >>> >>>>
>> >>> >>>> I suggest that the CloudStack should have the ability to keep
>> volume
>> >>> >>>> snapshots on the primary storage where this is supported by the
>> >>> storage.
>> >>> >>>> Perhaps having a per primary storage setting that enables this
>> >>> >>>> functionality. This will be beneficial for Ceph primary storage
>> on
>> >>> KVM
>> >>> >>>> hypervisors and perhaps on XenServer when Ceph will be supported
>> in
>> >>> a near
>> >>> >>>> future.
>> >>> >>>>
>> >>> >>>> This will greatly speed up the process of using snapshots on KVM
>> and
>> >>> users
>> >>> >>>> will actually start using snapshotting rather than giving up with
>> >>> >>>> frustration.
>> >>> >>>>
>> >>> >>>> I have opened the ticket CLOUDSTACK-8256, so please cast your
>> vote
>> >>> if you
>> >>> >>>> are in agreement.
>> >>> >>>>
>> >>> >>>> Thanks for your input
>> >>> >>>>
>> >>> >>>> Andrei
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>
>> >>> >>>
>> >>> >>> --
>> >>> >>>
>> >>> >>> Andrija Panić
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> *Mike Tutkowski*
>> >> *Senior CloudStack Developer, SolidFire Inc.*
>> >> e: mike.tutkow...@solidfire.com
>> >> o: 303.746.7302
>> >> Advancing the way the world uses the cloud
>> >> <http://solidfire.com/solution/overview/?video=play>*™*
>> >>
>> >
>> >
>> >
>> > --
>> > *Mike Tutkowski*
>> > *Senior CloudStack Developer, SolidFire Inc.*
>> > e: mike.tutkow...@solidfire.com
>> > o: 303.746.7302
>> > Advancing the way the world uses the cloud
>> > <http://solidfire.com/solution/overview/?video=play>*™*
>>
>
>
>
> --
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkow...@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the cloud
> <http://solidfire.com/solution/overview/?video=play>*™*
>



-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkow...@solidfire.com
o: 303.746.7302
Advancing the way the world uses the cloud
<http://solidfire.com/solution/overview/?video=play>*™*

Reply via email to