Re: [DISCUSS] Scaling up CPU and RAM for running VMs

Chiradeep Vittal Tue, 22 Jan 2013 10:18:13 -0800

You should also consider the case that the management server that has
initiated the scale-up fails in the middle of the procedure.


In general I would avoid locks as much as possible unless there is a
danger of corrupting the database. If for example the capacity calculation
says there is enough capacity for the scale-up operation, but when
actually implemented, there isn't, we can return a failure for the async
job.

Failures at the resource layer also raise the question: are we going to
retry? If so how many times? Which failures will require a retry, and
which ones will cause an abort?

I've also thought about the API design (changing it from synchronous to
asynchronous). I believe this is a big semantic change. I would introduce
a new API for this behavior.

On 1/22/13 5:54 AM, "Nitin Mehta" <nitin.me...@citrix.com> wrote:

>Chiradeep - thanks for your questions. Please find my answers inline. I
>might not have the best solutions for some of them so looking for some
>guidance as well.
>All these interactions make for good test cases so the QA for this feature
>should test these interactions.
>
>
>On 22/01/13 12:18 PM, "Chiradeep Vittal" <chiradeep.vit...@citrix.com>
>wrote:
>
>>What usage events will be generated to ensure that the proper billing is
>>done?
>
>Currently usage implementation won't be able to handle this. I will
>introduce a
>new event to handle it for dynamically as well as statically(for Hvs that
>do not support this)
>
>>We need more details about the actual classes being affected and the
>>layers at which the orchestration is being done.
>
>I will put that information in the FS. It will be based on the flowchart
>in the FS.
>
>>What are the possible interactions that you are taking care of?
>> - user powers off the vm during the operation
>
>Need to explore how the HV is handling it. The HV should be handling it
>gracefully else its a bug on the HV.
>If the HV handles it cleanly then CS can accordingly surface it and take
>corrective actions like updating the DB etc.
>
>> - race conditions during calculation of sufficient capacity
>
>It will be implemented parallel to how we do it in allocators. As soon as
>we find a suitable destination we lock the capacity and release it in case
>of
>failures. All this is through state machine transitions so will be done
>cleanly.
>
>> - failure during live migration
>
>As you see in the flowchart in the FS I won't be touching the live
>migration but leveraging on the current implementation.
>I am assuming that we already handle it gracefully. But will definitely
>test it.
>
>> - HA event during live migration / upgrade
>
>I am assuming that we already handle the HA event during live migrations
>since this scenario is available in current implementation.
>Will try and leverage on the same for HA event during vm upgrade.
>
>> - scheduled snapshot of volumes during the operation
>
>For vmware, the entire vm is locked by HV and this can be an issue. I will
>leverage on current implementations for existing interactions like
>scheduled snapshots events during live migration and will replicate the
>same.
>
>
>> - attach / detach during the operation
>
>Same thing as the above.
>
>> - hypervisor fails the upgrade.
>
>The HV should be handling it gracefully else its a bug on the HV.
>If the HV handles it cleanly then CS can accordingly surface it and take
>corrective actions like updating the DB
>
>
>
>>
>>The idea is to not handle every possible scenario, but to ensure that the
>>vm (and system) is in a sane recoverable state after the unexpected
>>interaction.
>>
>>
>>On 1/21/13 10:01 PM, "Koushik Das" <koushik....@citrix.com> wrote:
>>
>>>See inline for 1.
>>>
>>>-----Original Message-----
>>>From: Hari Kannan [mailto:hari.kan...@citrix.com]
>>>Sent: Tuesday, January 22, 2013 10:51 AM
>>>To: cloudstack-dev@incubator.apache.org
>>>Subject: RE: [DISCUSS] Scaling up CPU and RAM for running VMs
>>>
>>>Hello Nitin, Koushik,
>>>
>>>I'm following up on this feature - is the FS located here still
>>>accurate/up to date?
>>>
>>>I also wish to get clarification on a couple of things:
>>>
>>>1.      There is a reference - open issue 1: "Ability to mark the VM for
>>>scale up at creation time " - what is the intent behind this capability?
>>>Why cant every VM be capable of scaling? Also, given the capability of
>>>scaling up is actually a property of {OS, Hypervisor} what would be the
>>>intent behind having this as a property of a service offering? How was
>>>this "closed"?
>>>
>>>[Koushik] For all HVs the ability to dynamically increase RAM/CPU needs
>>>to be explicitly enabled. This may mean that for some/all HVs there may
>>>be some overhead in terms of performance/capacity planning etc. (came
>>>across the following for Vmware
>>>http://www.yellow-bricks.com/2012/01/16/enabling-hot-add-by-default-cc-g
>>>a
>>>b
>>>virtualworld/). As a starting point I would like to have it enabled by
>>>default for all VMs. But later it may be required to attach some premium
>>>with this kind of offering.
>>>
>>>2.      We also know that XS and KVM support this for Linux (max needs
>>>to
>>>be pre-defined) - so, I assume we are supporting both these platforms,
>>>in
>>>addition to VMware?
>>>3.      In case there is no capacity in cluster to scale up, just making
>>>sure that the existing VM will not have any impact, right?
>>>
>>>Hari
>>>
>>>-----Original Message-----
>>>From: Marcus Sorensen [mailto:shadow...@gmail.com]
>>>Sent: Thursday, December 20, 2012 9:47 AM
>>>To: cloudstack-dev@incubator.apache.org
>>>Subject: Re: [DISCUSS] Scaling up CPU and RAM for running VMs
>>>
>>>Oh, if it's not already obvious, we're onboard for collaborating on this
>>>feature and can help implement the KVM hypervisor portions. :-)
>>>
>>>
>>>On Thu, Dec 20, 2012 at 8:44 AM, Marcus Sorensen
>>><shadow...@gmail.com>wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Thu, Dec 20, 2012 at 4:52 AM, Koushik Das
>>>><koushik....@citrix.com>wrote:
>>>>
>>>>> See inline
>>>>>
>>>>> Thanks,
>>>>> Koushik
>>>>>
>>>>> > -----Original Message-----
>>>>> > From: Chip Childers [mailto:chip.child...@sungard.com]
>>>>> > Sent: Wednesday, December 19, 2012 7:55 PM
>>>>> > To: cloudstack-dev@incubator.apache.org
>>>>> > Subject: Re: [DISCUSS] Scaling up CPU and RAM for running VMs
>>>>> >
>>>>> > On Wed, Dec 19, 2012 at 3:34 AM, Koushik Das
>>>>> > <koushik....@citrix.com>
>>>>> > wrote:
>>>>> > > See inline
>>>>> > >
>>>>> > >> -----Original Message-----
>>>>> > >> From: Marcus Sorensen [mailto:shadow...@gmail.com]
>>>>> > >> Sent: Tuesday, December 18, 2012 10:35 PM
>>>>> > >> To: cloudstack-dev@incubator.apache.org
>>>>> > >> Subject: Re: [DISCUSS] Scaling up CPU and RAM for running VMs
>>>>> > >>
>>>>> > >> The FS looks good and addresses the things I'd want it to
>>>>> > >> (scaling should be limited to within cluster, use offerings).
>>>>> > >>
>>>>> > >> As you mention, there's a real problem surrounding no support
>>>>> > >> for scaling down CPU, and it's just as much a problem with the
>>>>> > >> guests as it is with hvms at the moment, it seems. This makes it
>>>>> > >> hard to just set a VM as a dynamic one, since at some point
>>>>> > >> you'll likely trigger it to scale up and have to reboot to get
>>>>> > >> back down. My suggestion if this goes through however is that
>>>>> > >> instead of marking a vm for auto scale, we can either attach
>>>>> > >> multiple compute offerings (with a priority or "level") to a VM,
>>>>> > >> along with triggers (we can't really
>>>>> trigger on
>>>>> > memory, but perhaps cpu utilization over a specific time, e.g.
>>>>> > >> if cpu is at 80% for x time, fall back to the next offering), or
>>>>> > >> we can create a specific single compute offering that allows you
>>>>> > >> to specify a min and max memory, cpu, and a trigger at which it
>>>>> > >> scales (this latter one is my preference).
>>>>> > >>
>>>>> > >> The whole thing is problematic though, because people can
>>>>> > >> inadvertently trigger their VM to scale up when they're
>>>>> > >> installing updates or compiling or something and then have to
>>>>> > >> reboot to come back down. If we can't take away resources
>>>>> > >> without manual intervention, we shouldn't add them. For this
>>>>> > >> reason I'd like to see the focus (at least initially) on simply
>>>>> > >> being able to change to larger compute offerings while the VM is
>>>>> > >> up. With this in place, if someone really wants to autoscale,
>>>>> > >> they can use the api in a combination of fetching the VM stats
>>>>> > >> and the existing changeServiceForVirtualMachine. Or we can put
>>>>> > >> that in, but I think
>>>>> any
>>>>> > implementation will be a poor experience without being able to go
>>>>> > both ways.
>>>>> > >>
>>>>> > >
>>>>> > > This is a good suggestion but as you have mentioned first
>>>>> > > priority is
>>>>> to have
>>>>> > the basic stuff working (increasing CPU/RAM for running VMs).
>>>>> > > Also another thing is that HVs (at least Vmware) require that a
>>>>> > > VM is
>>>>> > configured appropriately when it is stopped in order to support
>>>>> increasing
>>>>> > CPU/RAM while it is running. We can either do this for all VMs
>>>>> irrespective of
>>>>> > the fact whether the CPU/RAM is going to be actually increased OR
>>>>> > do it
>>>>> only
>>>>> > for selective VMs (maybe based on compute offering). If this is
>>>>> > going
>>>>> to be
>>>>> > common across all HVs the latter can be done.
>>>>>
>>>>
>>>> I think it could be done either way. The straightforward way is via
>>>> offering that allows for max/current CPU and max/current RAM to be
>>>> entered (basically exposing how the hypervisor settings themselves
>>>> work). But you could also do a global setting of some sort that says
>>>> 'set everything to a max of X CPU and Y RAM', so that every service
>>>> offering can be upgraded live. As you mention, it will require at
>>>> least a restart of the VMs to apply, so perhaps users could just
>>>> switch service offerings anyway. It could be handy to allow people to
>>>> upgrade service offering when it was unplanned for, though.
>>>>
>>>>
>>>>> > >
>>>>> > >> I don't know, maybe I'm off in left field here, I'd be
>>>>> > >> interested in hearing the thoughts of others.
>>>>> > >>
>>>>> > >> You mention  'upgradeVirtualMachine', which should be mentioned
>>>>> > >> on the customer facing API is called
>>>>> > >> 'changeServiceForVirtualMachine', just to reduce confusion.
>>>>> > >>
>>>>> > >
>>>>> > > upgradeVirtualMachine is an existing command (see
>>>>> > UpgradeVMCmd.java), was planning to reuse it. But yes if the name
>>>>> > sounds confusing we can deprecate it and create a new command with
>>>>> > the name you have suggested.
>>>>> > >
>>>>> >
>>>>> > Please don't break backward compatibility without the whole list
>>>>> discussing
>>>>> > the implications on a dedicated thread.  We had previously agreed
>>>>> > that
>>>>> we
>>>>> > were going to maintain API compatibility between 4.0.0-incubating
>>>>> > and
>>>>> our
>>>>> > next feature release.  If we break it, we have to release as
>>>>> 5.0.0-incubating
>>>>> > instead of 4.1.0-incubating.
>>>>>
>>>>> In that case will add a new async API changeServiceForVirtualMachine
>>>>> (or if anyone else comes up with a better name) which will work for
>>>>> both running and stopped VMs. upgradeVirtualMachine would continue to
>>>>> exist till
>>>>> 5.0.0 happens.
>>>>>
>>>>
>>>> Would this break backward compatibility? If an API call goes from
>>>> upgrading VMs only while they're off, and still upgrades VMs only
>>>> while they're off, but also upgrades VMs with a newer, specific
>>>> service offering type while they're on, does that break backward
>>>> compatibility? Or let's say we simply removed the check to make sure
>>>> the VM was off, and instead just checked if the VM was started with
>>>> the newer compatible settings... would that break backward
>>>> compatibility? The call still does what it did before when used as
>>>>before (changes service offering while the VM is off).
>>>>
>>>> Regarding upgradeVirtualMachine, I saw no mention of it in the API
>>>> docs, and found that in the code, changeServiceForVirtualMachine was
>>>> mapped to UpgradeVMCmd.java, which is why I mentioned the confusion.
>>>> 'upgradeVirtualMachine' only exists as an internal method of the
>>>> userVmService. See the file "client/tomcatconf/commands.properties.in"
>>>>
>>>> changeServiceForVirtualMachine=com.cloud.api.commands.UpgradeVMCmd
>>>>
>>>>
>>>>
>>>>>
>>>>> >
>>>>> > >>
>>>>> > >> On Tue, Dec 18, 2012 at 9:18 AM, Koushik Das
>>>>> > >> <koushik....@citrix.com
>>>>> >
>>>>> > >> wrote:
>>>>> > >>
>>>>> > >> > Created first draft of the FS
>>>>> > >> >
>>>>> > >>
>>>>> > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Dynamic+scal
>>>>> > in
>>>>> > >> g
>>>>> > >> > +of+CPU+and+RAM
>>>>> > >> > Also created jira issue
>>>>> > >> > https://issues.apache.org/jira/browse/CLOUDSTACK-658
>>>>> > >> >
>>>>> > >> > Comments? There is an 'open issue' section where I have
>>>>> > >> > mentioned some issues that needs to be closed
>>>>> > >> >
>>>>> > >> > Thanks,
>>>>> > >> > Koushik
>>>>> > >> >
>>>>> > >> > > -----Original Message-----
>>>>> > >> > > From: Koushik Das [mailto:koushik....@citrix.com]
>>>>> > >> > > Sent: Saturday, December 15, 2012 11:14 PM
>>>>> > >> > > To: cloudstack-dev@incubator.apache.org
>>>>> > >> > > Subject: [DISCUSS] Scaling up CPU and RAM for running VMs
>>>>> > >> > >
>>>>> > >> > > Currently CS supports changing CPU and RAM for stopped VM.
>>>>> > >> > > This is achieved by changing compute offering of the VM
>>>>> > >> > > (with new CPU and RAM
>>>>> > >> > > values) and then starting it. I am planning to extend the
>>>>> > >> > > same for
>>>>> > >> > running VM
>>>>> > >> > > as well. Initially planning to do it for Vmware where CPU
>>>>> > >> > > and RAM can be dynamically increased. Support of other HVs
>>>>> > >> > > can also be added if they support increasing CPU/RAM.
>>>>> > >> > >
>>>>> > >> > > Assuming that in the updated compute offering only CPU and
>>>>> > >> > > RAM has changed, the deployment planner can either select
>>>>> > >> > > the same host in which case the values are dynamically
>>>>> > >> > > scaled up OR a different one in which
>>>>> > >> > case
>>>>> > >> > > the operation fails. In future if there is support for live
>>>>> > >> > > migration
>>>>> > >> > (provided
>>>>> > >> > > HV supports it) then another option in the latter case could
>>>>> > >> > > be to
>>>>> > >> > migrate the
>>>>> > >> > > VM first and then scale it up.
>>>>> > >> > >
>>>>> > >> > > I will start working on the FS and share it out sometime
>>>>> > >> > > next
>>>>> week.
>>>>> > >> > >
>>>>> > >> > > Comments/suggestions?
>>>>> > >> > >
>>>>> > >> > > Thanks,
>>>>> > >> > > Koushik
>>>>> > >> >
>>>>> > >
>>>>>
>>>>
>>>>
>

Re: [DISCUSS] Scaling up CPU and RAM for running VMs

Reply via email to