You should also consider the case that the management server that has initiated the scale-up fails in the middle of the procedure.
In general I would avoid locks as much as possible unless there is a danger of corrupting the database. If for example the capacity calculation says there is enough capacity for the scale-up operation, but when actually implemented, there isn't, we can return a failure for the async job. Failures at the resource layer also raise the question: are we going to retry? If so how many times? Which failures will require a retry, and which ones will cause an abort? I've also thought about the API design (changing it from synchronous to asynchronous). I believe this is a big semantic change. I would introduce a new API for this behavior. On 1/22/13 5:54 AM, "Nitin Mehta" <nitin.me...@citrix.com> wrote: >Chiradeep - thanks for your questions. Please find my answers inline. I >might not have the best solutions for some of them so looking for some >guidance as well. >All these interactions make for good test cases so the QA for this feature >should test these interactions. > > >On 22/01/13 12:18 PM, "Chiradeep Vittal" <chiradeep.vit...@citrix.com> >wrote: > >>What usage events will be generated to ensure that the proper billing is >>done? > >Currently usage implementation won't be able to handle this. I will >introduce a >new event to handle it for dynamically as well as statically(for Hvs that >do not support this) > >>We need more details about the actual classes being affected and the >>layers at which the orchestration is being done. > >I will put that information in the FS. It will be based on the flowchart >in the FS. > >>What are the possible interactions that you are taking care of? >> - user powers off the vm during the operation > >Need to explore how the HV is handling it. The HV should be handling it >gracefully else its a bug on the HV. >If the HV handles it cleanly then CS can accordingly surface it and take >corrective actions like updating the DB etc. > >> - race conditions during calculation of sufficient capacity > >It will be implemented parallel to how we do it in allocators. As soon as >we find a suitable destination we lock the capacity and release it in case >of >failures. All this is through state machine transitions so will be done >cleanly. > >> - failure during live migration > >As you see in the flowchart in the FS I won't be touching the live >migration but leveraging on the current implementation. >I am assuming that we already handle it gracefully. But will definitely >test it. > >> - HA event during live migration / upgrade > >I am assuming that we already handle the HA event during live migrations >since this scenario is available in current implementation. >Will try and leverage on the same for HA event during vm upgrade. > >> - scheduled snapshot of volumes during the operation > >For vmware, the entire vm is locked by HV and this can be an issue. I will >leverage on current implementations for existing interactions like >scheduled snapshots events during live migration and will replicate the >same. > > >> - attach / detach during the operation > >Same thing as the above. > >> - hypervisor fails the upgrade. > >The HV should be handling it gracefully else its a bug on the HV. >If the HV handles it cleanly then CS can accordingly surface it and take >corrective actions like updating the DB > > > >> >>The idea is to not handle every possible scenario, but to ensure that the >>vm (and system) is in a sane recoverable state after the unexpected >>interaction. >> >> >>On 1/21/13 10:01 PM, "Koushik Das" <koushik....@citrix.com> wrote: >> >>>See inline for 1. >>> >>>-----Original Message----- >>>From: Hari Kannan [mailto:hari.kan...@citrix.com] >>>Sent: Tuesday, January 22, 2013 10:51 AM >>>To: cloudstack-dev@incubator.apache.org >>>Subject: RE: [DISCUSS] Scaling up CPU and RAM for running VMs >>> >>>Hello Nitin, Koushik, >>> >>>I'm following up on this feature - is the FS located here still >>>accurate/up to date? >>> >>>I also wish to get clarification on a couple of things: >>> >>>1. There is a reference - open issue 1: "Ability to mark the VM for >>>scale up at creation time " - what is the intent behind this capability? >>>Why cant every VM be capable of scaling? Also, given the capability of >>>scaling up is actually a property of {OS, Hypervisor} what would be the >>>intent behind having this as a property of a service offering? How was >>>this "closed"? >>> >>>[Koushik] For all HVs the ability to dynamically increase RAM/CPU needs >>>to be explicitly enabled. This may mean that for some/all HVs there may >>>be some overhead in terms of performance/capacity planning etc. (came >>>across the following for Vmware >>>http://www.yellow-bricks.com/2012/01/16/enabling-hot-add-by-default-cc-g >>>a >>>b >>>virtualworld/). As a starting point I would like to have it enabled by >>>default for all VMs. But later it may be required to attach some premium >>>with this kind of offering. >>> >>>2. We also know that XS and KVM support this for Linux (max needs >>>to >>>be pre-defined) - so, I assume we are supporting both these platforms, >>>in >>>addition to VMware? >>>3. In case there is no capacity in cluster to scale up, just making >>>sure that the existing VM will not have any impact, right? >>> >>>Hari >>> >>>-----Original Message----- >>>From: Marcus Sorensen [mailto:shadow...@gmail.com] >>>Sent: Thursday, December 20, 2012 9:47 AM >>>To: cloudstack-dev@incubator.apache.org >>>Subject: Re: [DISCUSS] Scaling up CPU and RAM for running VMs >>> >>>Oh, if it's not already obvious, we're onboard for collaborating on this >>>feature and can help implement the KVM hypervisor portions. :-) >>> >>> >>>On Thu, Dec 20, 2012 at 8:44 AM, Marcus Sorensen >>><shadow...@gmail.com>wrote: >>> >>>> >>>> >>>> >>>> On Thu, Dec 20, 2012 at 4:52 AM, Koushik Das >>>><koushik....@citrix.com>wrote: >>>> >>>>> See inline >>>>> >>>>> Thanks, >>>>> Koushik >>>>> >>>>> > -----Original Message----- >>>>> > From: Chip Childers [mailto:chip.child...@sungard.com] >>>>> > Sent: Wednesday, December 19, 2012 7:55 PM >>>>> > To: cloudstack-dev@incubator.apache.org >>>>> > Subject: Re: [DISCUSS] Scaling up CPU and RAM for running VMs >>>>> > >>>>> > On Wed, Dec 19, 2012 at 3:34 AM, Koushik Das >>>>> > <koushik....@citrix.com> >>>>> > wrote: >>>>> > > See inline >>>>> > > >>>>> > >> -----Original Message----- >>>>> > >> From: Marcus Sorensen [mailto:shadow...@gmail.com] >>>>> > >> Sent: Tuesday, December 18, 2012 10:35 PM >>>>> > >> To: cloudstack-dev@incubator.apache.org >>>>> > >> Subject: Re: [DISCUSS] Scaling up CPU and RAM for running VMs >>>>> > >> >>>>> > >> The FS looks good and addresses the things I'd want it to >>>>> > >> (scaling should be limited to within cluster, use offerings). >>>>> > >> >>>>> > >> As you mention, there's a real problem surrounding no support >>>>> > >> for scaling down CPU, and it's just as much a problem with the >>>>> > >> guests as it is with hvms at the moment, it seems. This makes it >>>>> > >> hard to just set a VM as a dynamic one, since at some point >>>>> > >> you'll likely trigger it to scale up and have to reboot to get >>>>> > >> back down. My suggestion if this goes through however is that >>>>> > >> instead of marking a vm for auto scale, we can either attach >>>>> > >> multiple compute offerings (with a priority or "level") to a VM, >>>>> > >> along with triggers (we can't really >>>>> trigger on >>>>> > memory, but perhaps cpu utilization over a specific time, e.g. >>>>> > >> if cpu is at 80% for x time, fall back to the next offering), or >>>>> > >> we can create a specific single compute offering that allows you >>>>> > >> to specify a min and max memory, cpu, and a trigger at which it >>>>> > >> scales (this latter one is my preference). >>>>> > >> >>>>> > >> The whole thing is problematic though, because people can >>>>> > >> inadvertently trigger their VM to scale up when they're >>>>> > >> installing updates or compiling or something and then have to >>>>> > >> reboot to come back down. If we can't take away resources >>>>> > >> without manual intervention, we shouldn't add them. For this >>>>> > >> reason I'd like to see the focus (at least initially) on simply >>>>> > >> being able to change to larger compute offerings while the VM is >>>>> > >> up. With this in place, if someone really wants to autoscale, >>>>> > >> they can use the api in a combination of fetching the VM stats >>>>> > >> and the existing changeServiceForVirtualMachine. Or we can put >>>>> > >> that in, but I think >>>>> any >>>>> > implementation will be a poor experience without being able to go >>>>> > both ways. >>>>> > >> >>>>> > > >>>>> > > This is a good suggestion but as you have mentioned first >>>>> > > priority is >>>>> to have >>>>> > the basic stuff working (increasing CPU/RAM for running VMs). >>>>> > > Also another thing is that HVs (at least Vmware) require that a >>>>> > > VM is >>>>> > configured appropriately when it is stopped in order to support >>>>> increasing >>>>> > CPU/RAM while it is running. We can either do this for all VMs >>>>> irrespective of >>>>> > the fact whether the CPU/RAM is going to be actually increased OR >>>>> > do it >>>>> only >>>>> > for selective VMs (maybe based on compute offering). If this is >>>>> > going >>>>> to be >>>>> > common across all HVs the latter can be done. >>>>> >>>> >>>> I think it could be done either way. The straightforward way is via >>>> offering that allows for max/current CPU and max/current RAM to be >>>> entered (basically exposing how the hypervisor settings themselves >>>> work). But you could also do a global setting of some sort that says >>>> 'set everything to a max of X CPU and Y RAM', so that every service >>>> offering can be upgraded live. As you mention, it will require at >>>> least a restart of the VMs to apply, so perhaps users could just >>>> switch service offerings anyway. It could be handy to allow people to >>>> upgrade service offering when it was unplanned for, though. >>>> >>>> >>>>> > > >>>>> > >> I don't know, maybe I'm off in left field here, I'd be >>>>> > >> interested in hearing the thoughts of others. >>>>> > >> >>>>> > >> You mention 'upgradeVirtualMachine', which should be mentioned >>>>> > >> on the customer facing API is called >>>>> > >> 'changeServiceForVirtualMachine', just to reduce confusion. >>>>> > >> >>>>> > > >>>>> > > upgradeVirtualMachine is an existing command (see >>>>> > UpgradeVMCmd.java), was planning to reuse it. But yes if the name >>>>> > sounds confusing we can deprecate it and create a new command with >>>>> > the name you have suggested. >>>>> > > >>>>> > >>>>> > Please don't break backward compatibility without the whole list >>>>> discussing >>>>> > the implications on a dedicated thread. We had previously agreed >>>>> > that >>>>> we >>>>> > were going to maintain API compatibility between 4.0.0-incubating >>>>> > and >>>>> our >>>>> > next feature release. If we break it, we have to release as >>>>> 5.0.0-incubating >>>>> > instead of 4.1.0-incubating. >>>>> >>>>> In that case will add a new async API changeServiceForVirtualMachine >>>>> (or if anyone else comes up with a better name) which will work for >>>>> both running and stopped VMs. upgradeVirtualMachine would continue to >>>>> exist till >>>>> 5.0.0 happens. >>>>> >>>> >>>> Would this break backward compatibility? If an API call goes from >>>> upgrading VMs only while they're off, and still upgrades VMs only >>>> while they're off, but also upgrades VMs with a newer, specific >>>> service offering type while they're on, does that break backward >>>> compatibility? Or let's say we simply removed the check to make sure >>>> the VM was off, and instead just checked if the VM was started with >>>> the newer compatible settings... would that break backward >>>> compatibility? The call still does what it did before when used as >>>>before (changes service offering while the VM is off). >>>> >>>> Regarding upgradeVirtualMachine, I saw no mention of it in the API >>>> docs, and found that in the code, changeServiceForVirtualMachine was >>>> mapped to UpgradeVMCmd.java, which is why I mentioned the confusion. >>>> 'upgradeVirtualMachine' only exists as an internal method of the >>>> userVmService. See the file "client/tomcatconf/commands.properties.in" >>>> >>>> changeServiceForVirtualMachine=com.cloud.api.commands.UpgradeVMCmd >>>> >>>> >>>> >>>>> >>>>> > >>>>> > >> >>>>> > >> On Tue, Dec 18, 2012 at 9:18 AM, Koushik Das >>>>> > >> <koushik....@citrix.com >>>>> > >>>>> > >> wrote: >>>>> > >> >>>>> > >> > Created first draft of the FS >>>>> > >> > >>>>> > >> >>>>> > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Dynamic+scal >>>>> > in >>>>> > >> g >>>>> > >> > +of+CPU+and+RAM >>>>> > >> > Also created jira issue >>>>> > >> > https://issues.apache.org/jira/browse/CLOUDSTACK-658 >>>>> > >> > >>>>> > >> > Comments? There is an 'open issue' section where I have >>>>> > >> > mentioned some issues that needs to be closed >>>>> > >> > >>>>> > >> > Thanks, >>>>> > >> > Koushik >>>>> > >> > >>>>> > >> > > -----Original Message----- >>>>> > >> > > From: Koushik Das [mailto:koushik....@citrix.com] >>>>> > >> > > Sent: Saturday, December 15, 2012 11:14 PM >>>>> > >> > > To: cloudstack-dev@incubator.apache.org >>>>> > >> > > Subject: [DISCUSS] Scaling up CPU and RAM for running VMs >>>>> > >> > > >>>>> > >> > > Currently CS supports changing CPU and RAM for stopped VM. >>>>> > >> > > This is achieved by changing compute offering of the VM >>>>> > >> > > (with new CPU and RAM >>>>> > >> > > values) and then starting it. I am planning to extend the >>>>> > >> > > same for >>>>> > >> > running VM >>>>> > >> > > as well. Initially planning to do it for Vmware where CPU >>>>> > >> > > and RAM can be dynamically increased. Support of other HVs >>>>> > >> > > can also be added if they support increasing CPU/RAM. >>>>> > >> > > >>>>> > >> > > Assuming that in the updated compute offering only CPU and >>>>> > >> > > RAM has changed, the deployment planner can either select >>>>> > >> > > the same host in which case the values are dynamically >>>>> > >> > > scaled up OR a different one in which >>>>> > >> > case >>>>> > >> > > the operation fails. In future if there is support for live >>>>> > >> > > migration >>>>> > >> > (provided >>>>> > >> > > HV supports it) then another option in the latter case could >>>>> > >> > > be to >>>>> > >> > migrate the >>>>> > >> > > VM first and then scale it up. >>>>> > >> > > >>>>> > >> > > I will start working on the FS and share it out sometime >>>>> > >> > > next >>>>> week. >>>>> > >> > > >>>>> > >> > > Comments/suggestions? >>>>> > >> > > >>>>> > >> > > Thanks, >>>>> > >> > > Koushik >>>>> > >> > >>>>> > > >>>>> >>>> >>>> >