Re: [PROPOSAL] Storage Subsystem API Interface Additions

Darren Shepherd Wed, 18 Sep 2013 15:57:32 -0700

Nah, I guess its not so bad to have a volumeIds param.  Just as long as its
really clear that means nothing about consistency.


I would be a little concerned about how this will be implemented by other
storage providers though.  Currently if you do 5 snapshot API calls that
will launch 5 threads and they will happen in parallel.  If they get
batched and sent in one thread, how is the framework/driver going to handle
the snapshots for drivers that don't supporting batching?  Sequential would
be bad as sometimes it takes awhile to snapshot.

The APIs you mentioned that takes lists really just manipulate data in the
DB, so they can easily batch and transactionally do a bunch at once.

Maybe someone who's more familiar with the storage implementation can
comment?

Darren


On Wed, Sep 18, 2013 at 12:46 PM, SuichII, Christopher <
[email protected]> wrote:

> That certainly would work, but I don't see how it is a better design. Can
> you elaborate on how sending multiple volumeIds is hackish? Look at the
> existing API framework. We have several APIs that accept lists as
> parameters. Normally, they're used for things like querying or deleting.
> Take a look at some of these commands:
> -ArchiveEventsCmd
> -DeleteEventsCmd
> -DeleteSnapshotPoliciesCmd
>
> This kind of API is simply a shorthand for invoking another API many times.
>
> I think it is only an NetApp optimization in the sense that we're the only
> ones who need it right now. What we're asking for has nothing specific to
> do to NetApp. We would just like the shorthand ability to do things all at
> once rather than one at a time. I think other vendors could utilize this
> just as easily.
>
> -Chris
> --
> Chris Suich
> [email protected]
> NetApp Software Engineer
> Data Center Platforms – Cloud Solutions
> Citrix, Cisco & Red Hat
>
> On Sep 18, 2013, at 2:32 PM, Darren Shepherd <[email protected]>
> wrote:
>
> > Given this explanation.  Would the following not work?
> >
> > 1) Enhance UI to allow multi select.  There is no API change, UI will
> just
> > call snapshot a bunch of time
> > 2) Enhance storage framework or driver to detect that 20 requests just
> came
> > in within a window of X seconds and send them to the driver all at once.
> >
> > I know you said queuing on the backend is hackish, but having the user
> send
> > multiple volumeIds in the API is just as hackish to me.  We can only
> > guarentee to the user that the multiple snapshots taken are as consistent
> > as if they called snapshot API individually.  The user won't really know
> > exactly what NetApp volume they exist on and really neither will the
> > storage framework, as export != volume.  Only the driver knows if
> batching
> > is really possible.  So I'm not exactly saying queue, its short batches.
> >
> > In short, I'm seeing this as a bit more of a NetApp optimization than a
> > general thing.  I'm all for using storage device level snapshotting but
> it
> > seems like its going to be implementation specific.  Its interesting, if
> > you look at digital ocean they have snapshots and backups as two
> different
> > concepts.  You can see that they ran into this specific issue.  Full
> > storage volume snapshots are really difficult to expose to the user.  So
> > digital ocean does "backups" which are live but on a schedule and seem to
> > be a full volume backup.  And then there are snapshots which are on
> demand,
> > but require you to stop your VM (so they can essentially copy the qcow or
> > lv somewhere).
> >
> > Darren
> >
> >
> >
> >
> > On Wed, Sep 18, 2013 at 11:12 AM, SuichII, Christopher <
> > [email protected]> wrote:
> >
> >> First, let me apologize for the confusing terms, because some words here
> >> are overloaded:
> >> A volume…
> >> In CloudStack terms is a disk attached to a VM.
> >> In NetApp terms is an NFS volume, analogous to CloudStack primary
> storage,
> >> where all the CloudStack volumes are stored.
> >>
> >> A snapshot…
> >> In CloudStack terms is a backup of a VM.
> >> In NetApp terms is a copy of all the contents of a NetApp volume, taken
> at
> >> a point in time to create an analogous CloudStack snapshot for (up to)
> >> every CloudStack volume on that primary storage.
> >>
> >> There are several reasons that an API for snapshotting multiple volumes
> is
> >> more attractive to us than calling a single volume API over and over. A
> lot
> >> of it has to do with how we actually create the snapshots. Unlike a
> >> hypervisor snapshot, when we create a vm snapshot, the entire primary
> >> storage is backed up (but only the requested volume has an entry added
> to
> >> the db). To add on to this, our hardware has a hard limit of 255 storage
> >> volume level snapshots. So, if there were 255 vms on a single primary
> >> storage and each one of them performed a backup, no more backups could
> be
> >> taken before we start removing the oldest backup (without some trickery
> >> that we are currently working on). Some might say a solution to this
> would
> >> be queueing the requests and waiting till they're all finished, but that
> >> seems much more error prone and like hackish design compared to simply
> >> allowing multiple VM volumes to be specified.
> >>
> >> This is both a request for optimizing the backend and optimizing the
> >> experience for users. What happens when a user says they want to backup
> 30
> >> vm volumes at the same time? Is it not a cleaner experience to simply
> >> select all the volumes they want to back up, then click backup once?
> This
> >> way, the storage provider is given all the volumes at once and if they
> have
> >> some way of optimizing the request based on their hardware or software,
> >> they can take advantage of that. It can even be designed in such a way
> that
> >> if storage providers don't want to be given all the volumes at once,
> they
> >> can be called with each one individually, as to remain backwards
> compatible.
> >>
> >> Now, I'm also not saying that these two solutions can't co-exist. Even
> if
> >> we have the ability to backup multiple volumes at once, nothing is
> stopping
> >> users from backing them up one by one, so queueing is still something we
> >> may have to implement. However, I think extending the subsystem API to
> >> grant storage providers the ability to leverage any optimization they
> can
> >> without having to queue is a cleaner solution. If the concern is how
> users
> >> interpret what is going on in the backend, I think we can find some way
> to
> >> make that clear to them.
> >>
> >> -Chris
> >> --
> >> Chris Suich
> >> [email protected]
> >> NetApp Software Engineer
> >> Data Center Platforms – Cloud Solutions
> >> Citrix, Cisco & Red Hat
> >>
> >> On Sep 18, 2013, at 12:26 PM, Alex Huang <[email protected]> wrote:
> >>
> >>> That's my read on the proposal also but, Chris, please clarify.  I
> don't
> >> think the end user will see the change.  It's an optimization for
> >> interfacing with the storage backend.
> >>>
> >>> --Alex
> >>>
> >>>> -----Original Message-----
> >>>> From: Marcus Sorensen [mailto:[email protected]]
> >>>> Sent: Wednesday, September 18, 2013 9:22 AM
> >>>> To: [email protected]
> >>>> Subject: Re: [PROPOSAL] Storage Subsystem API Interface Additions
> >>>>
> >>>> Perhaps he needs to elaborate on the use case and what he means by
> more
> >>>> efficient.  He may be referring to multiple volumes in the sense of
> >>>> snapshotting the ROOT disks for 10 different VMs.
> >>>>
> >>>> On Wed, Sep 18, 2013 at 10:10 AM, Darren Shepherd
> >>>> <[email protected]> wrote:
> >>>>> Here's my general concern about multiple volume snapshots at once.
> >>>>> Giving such a feature leads the user to believe that snapshotting
> >>>>> multiple volumes at once will give them consistency across the
> volumes
> >> in
> >>>> the snapshot.
> >>>>> This is not true, and difficult to do with many hypervisors, and
> >>>>> typically requires an agent in the VM.  A single snapshot, as exists
> >>>>> today, is really crash consistent, meaning that there is may exist
> >>>>> unsync'd data.  To do a true multi volume snapshot requires a
> "quiesce"
> >>>> functionality in the VM.
> >>>>> So you do pause I/O queues, fsync, fsync, snapshot, snapshot, unpause
> >> I/O.
> >>>>>
> >>>>> I'm might be fine with the option of allowing multiple volumeId's to
> >>>>> be specified in the snapshot API, but it needs to be clear that those
> >>>>> snapshots may be taken sequentially and they are all independently
> >>>>> crash consistent.  But, if you make that clear, then why even have
> the
> >> API.
> >>>>> Essentially it is the same as doing multiple snapshot API commands.
> >>>>>
> >>>>> So really I would lean towards having the multiple snapshotting
> >>>>> supported in the driver or storage subsystem, but not exposed to the
> >>>>> user.  You can easy accomplish it by having a timed window on
> >>>>> snapshotting.  So every 10 seconds you do snapshots, if 5 requests
> >>>>> have queued in the last 10 seconds, you do them all at once.  This
> >> could be
> >>>> implemented as a framework thing.
> >>>>> If your provider implements "SnapshotBatching" interface and that has
> >>>>> a getBatchWindowTime(), then the framework can detect that it should
> >>>>> try to queue up some snapshot requests and send them to the driver in
> >>>>> a batch.  Or that could be implemented in the driver itself.  I would
> >>>>> lean toward doing it in the driver and if that goes well, we look at
> >>>>> pulling the functionality into core ACS.
> >>>>>
> >>>>> Darren
> >>>>>
> >>>>>
> >>>>> On Wed, Sep 18, 2013 at 5:22 AM, SuichII, Christopher <
> >>>>> [email protected]> wrote:
> >>>>>
> >>>>>> I would like to raise for discussion the idea of adding a couple
> >>>>>> methods to the Storage Subsystem API interface. Currently,
> >>>>>> takeSnapshot() and
> >>>>>> revertSnapshot() only support single VM volumes. We have a use case
> >>>>>> for snapshotting multiple VM volumes at the same time. For us, it is
> >>>>>> more efficient to snapshot them all at once rather than snapshot VM
> >>>>>> Volumes individually and this seems like a more elegant solution
> than
> >>>>>> queueing the requests within our plugin.
> >>>>>>
> >>>>>> Base on my investigation, this should require:
> >>>>>> -Two additional API to be invoked from the UI -Two additional
> methods
> >>>>>> added to the Storage Subsystem API interface -Changes in between the
> >>>>>> API level and invoking the Storage Subsystem API implementations (I
> >>>>>> know this is broad and vague), mainly around the SnapshotManger/Impl
> >>>>>>
> >>>>>> There are a couple topics we would like discussion on:
> >>>>>> -Would this be beneficial/detrimental/neutral to other storage
> >> providers?
> >>>>>> -How should we handle the addition of new methods to the Storage
> >>>>>> Subsystem API interface? Default them to throw an
> >>>> UnsupportedOperationException?
> >>>>>> Default to calling the single VM volume version multiple times?
> >>>>>> -Does anyone see any issues with allowing multiple snapshots to be
> >>>>>> taken at the same time or letting storage providers have a list of
> >>>>>> all the requested volumes to backup?
> >>>>>>
> >>>>>> Please let me know if I've missed any major topics for discussion or
> >>>>>> if anything needs clarification.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Chris
> >>>>>> --
> >>>>>> Chris Suich
> >>>>>> [email protected]
> >>>>>> NetApp Software Engineer
> >>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >>>>>>
> >>>>>>
> >>
> >>
>
>

Re: [PROPOSAL] Storage Subsystem API Interface Additions

Reply via email to