[google-appengine] Re: min_instances, min_idle_instances, and old versions

Alan deLespinasse Mon, 28 Sep 2020 14:43:41 -0700

tl;dr: *Never use min_instances!* It will just increase your bill 
unnecessarily.

On Thursday, September 17, 2020 at 2:26:50 PM UTC-4 Olu wrote:

>
> To start with, I can confirm that you would be billed for all Instances in 
> use, whether or not they are actively serving requests, traffic or not. 
>
> I will attempt to response to your inquiries as I have highlighted them 
> below:
>
> 1. Is the information shared on this link[1] accurate?
>
> A: It is not exactly clear which part of the information you are looking 
> to verify. However, I assume you are trying to confirm the explanation 
> about min_instance and min_idle_instances. If so, yes, the information is 
> accurate as those words were copied verbatim from the Documentation[2][3]. 
> If not, please reply to this thread. 
>

Sorry, I guess I was referring to something implied by that link, not 
directly stated, which is that setting min_instances to 1 or more will 
result in instances never getting shut down in old versions, even if they 
are not receiving traffic.

> 2. if an auto-scaled service has min_instances set to nonzero, does that 
> mean that instances in old versions don't get shut down when you deploy a 
> new version? And those instances get billed?
>
> A: I believe this article[4] explains in detail how Instances are managed, 
> particularly on scaling down. Scaling down Instances depend on the decrease 
> in the request volumes. Typically, App Engine Standard environment scales 
> down to 0[5] and as explained here[6], if the scheduler decides shuts down 
> active instances due to lack of requests being handled, another instance 
> will not start until prompted by an external request, even with the 
> min_instance set.
>
> With all that being said, as explained in this documentation[7], the 
> default behavior of App Engine Standard is that whenever a new application 
> version is deployed, except the --no-promote flag is used in the 
> deployment, the newly deployed version is automatically configured to 
> receive 100% of traffic. So, with no traffic to the older version, the 
> scheduler would shut down the instances due to lack of requests, even if 
> the min_instance is set to nonzero.
>

Obviously setting min_instances overrides the default behavior of scaling 
down to zero. And I'm now convinced that, with min_instances set to 
nonzero, it doesn't scale down to zero even in obsolete versions that are 
set to receive no traffic. This isn't documented behavior, but I've seen it 
implied elsewhere (like the Server Fault page above), and it was more or 
less confirmed by the agent who handled my billing complaint (they checked 
with support engineers, I believe).

(As the documentation mentions, "For this feature to function properly, you 
must make sure that warmup requests are enabled and that your application 
handles warmup requests." So some users may have min_instances set to more 
than zero, but not see the above problem, because it is not actually 
configured correctly to maintain a minimum number of instances. I made this 
mistake for a while.)

> If you are experiencing a different behavior, I suggest you reach out 
> directly to the GCP Support Engineers[8] for better evaluation of the 
> issue. 
>

So apparently I have to pay for a support plan just to get information that 
should be in the documentation...

> 3. What is min_idle_instances actually supposed to mean?
>
> A: As explained in the documentation[3], this is the number of instances 
> that keeps running and ready to serve traffic. The idle Instances helps to 
> avoid the effect of pending latency on your App Engine application.
>

I noticed that this documentation has recently been updated (maybe partly 
in response to my complaints?). It now says "The number of *additional* 
instances..." (emphasis mine), and goes on to explain that by "additional 
instances", it means that App Engine calculates the "necessary" number of 
instances to server current load, and adds on min_idle_instances more 
instances. So it does *not* mean that there will always be this many "idle" 
instances (for some definition of "idle"), as the name would imply. The new 
documentation is a big improvement. (new version 
<https://cloud.google.com/appengine/docs/standard/python3/config/appref#min_idle_instances>

/ old version 
<https://web.archive.org/web/20200504082931if_/https://cloud.google.com/appengine/docs/standard/python3/config/appref>
)

(Still waiting for the min_instances documentation to be updated to warn 
about the danger of zombie instances)

> As I may have alluded above, Instances are created whenever requests are 
> received. When instances are created, there are certain steps that apply 
> for the Instance to start up and be ready to attend to requests. These are 
> explained in these documentation[9][10]. Basically, having Idle instances 
> help to avoid such steps that would cause pending latency.
>
> 4.  Do I need to set max_idle_instances? 
>
> A: No, you do not have need to set this parameter as it is Optional. 
> Indeed, the default value of the  max_idle_instances is automatic, which 
> implies that the max is determined by the App Engine Autoscaler depending 
> particularly on the number of requests being handled. 
>

Sorry, I was imprecise. I wasn't asking if it's required. I was asking if I 
*should* set it, i.e, if I might get surprises in my bill if it's not set, 
or anything. It is still not clear to me what it actually means, since 
there's no clear definition of "idle" provided, and anyway because of the 
previous confusion over min_idle_instances, I don't want to assume that it 
has anything to do with instances that are "idle". The current 
documentation implies that it has something to do with how rapidly 
instances will be scaled down after a traffic peak, but doesn't give me any 
way to quantitatively predict how rapidly it would scale down for a 
particular value. Anyway I'm not setting this for now.

For anyone reading this who's curious, this is my new production app.yaml 
file:

runtime: python37
instance_class: F4
automatic_scaling:
  min_instances: 0
  min_idle_instances: 1
  max_instances: 10
inbound_services:
- warmup

With this configuration, there are always at least 2 instances of the 
current version. We always have a minimum of 1 request per minute (from a 
cron job); I assume it would probably scale down to 1 instance if we went a 
sufficiently long time with no requests at all (I have no idea how long it 
would take). Old versions do scale down to zero instances, though sometimes 
it takes a while. For our integration and staging environments, we set 
min_idle_instances to zero and max_instances to 2, and there is always at 
least one instance (presumably would scale down to zero if given a chance).

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/8cc25eb9-2a1d-4f13-9863-2429cfffa796n%40googlegroups.com.

[google-appengine] Re: min_instances, min_idle_instances, and old versions

Reply via email to