[openstack-dev] [nova][publiccloud-wg] Proposal to shelve on stop/suspend

Matt Riedemann Fri, 14 Sep 2018 16:26:31 -0700

tl;dr: I'm proposing a new parameter to the server stop (and suspend?)APIs to control if nova shelve offloads the server.

Long form: This came up during the public cloud WG session this weekbased on a couple of feature requests [1][2]. When a user stops/suspendsa server, the hypervisor frees up resources on the host but novacontinues to track those resources as being used on the host so thescheduler can't put more servers there. What operators would like to dois that when a user stops a server, nova actually shelve offloads theserver from the host so they can schedule new servers on that host. Onstart/resume of the server, nova would find a new host for the server.This also came up in Vancouver where operators would like to free uplimited expensive resources like GPUs when the server is stopped. Thisis also the behavior in AWS.

The problem with shelve is that it's great for operators but users justdon't use it, maybe because they don't know what it is and stop worksjust fine. So how do you get users to opt into shelving their server?

I've proposed a high-level blueprint [3] where we'd add a new(microversioned) parameter to the stop API with three options:


* auto
* offload
* retain

Naming is obviously up for debate. The point is we would default to autoand if auto is used, the API checks a config option to determine thebehavior - offload or retain. By default we would retain for backwardcompatibility. For users that don't care, they get auto and it's fine.For users that do care, they either (1) don't opt into the microversionor (2) specify the specific behavior they want. I don't think we need toexpose what the cloud's configuration for auto is because again, if youdon't care then it doesn't matter and if you do care, you can opt out ofthis.


"How do we get users to use the new microversion?" I'm glad you asked.

Well, nova CLI defaults to using the latest available microversionnegotiated between the client and the server, so by default, anyoneusing "nova stop" would get the 'auto' behavior (assuming the client andserver are new enough to support it). Long-term, openstack client planson doing the same version negotiation.

As for the server status changes, if the server is stopped and shelved,the status would be 'SHELVED_OFFLOADED' rather than 'SHUTDOWN'. Ibelieve this is fine especially if a user is not being specific anddoesn't care about the actual backend behavior. On start, the API wouldallow starting (unshelving) shelved offloaded (rather than just stopped)instances. Trying to hide shelved servers as stopped in the API would beoverly complex IMO so I don't want to try and mask that.

It is possible that a user that stopped and shelved their server couldhit a NoValidHost when starting (unshelving) the server, but that reallyshouldn't happen in a cloud that's configuring nova to shelve by defaultbecause if they are doing this, their SLA needs to reflect they have thecapacity to unshelve the server. If you can't honor that SLA, don'tshelve by default.

So, what are the general feelings on this before I go off and startwriting up a spec?


[1] https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1791681
[2] https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1791679
[3] https://blueprints.launchpad.net/nova/+spec/shelve-on-stop

--

Thanks,

Matt

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova][publiccloud-wg] Proposal to shelve on stop/suspend

Reply via email to