*TL;DR*: I'm seeing the flexible environment autoscaler start a machine, then stop it 2 minutes later, before it has even finished starting the application. Is this a misconfiguration on my side or a bug? Could the issue be that our application loads too slowly? Can I reconfigure this somehow? We see lots of instances "fail to start" like this, and I can't figure out why exactly.
*Longer description* We've been experimenting with moving parts of our application to the new "Flexible" environment. So far it works well! However, I've noticed that the autoscaler is extremely aggressive at stopping instances, which contradicts the documentation. I'm assuming the Flexible environment using the GCE autoscaler. The documentation there states "the 10 minute delay prevents virtual machine instances from being terminated before the instance has been running for 10 minutes. Cool down periods for new instances are ignored when deciding whether to scale down a group": https://cloud.google.com/compute/docs/autoscaler/understanding-autoscaler-decisions However, I've been looking at when instances are created and stopped, and I see many that get stopped after ~2-3 minutes before the machine has even booted! It seems to me from the description above, it should let my machine run for at least 10 minutes before stopping it. Could this be getting killed by some sort of health checking? This module is using the defaults, which looks like it should wait at least 5 minutes before declaring an instance unhealthy. I've actually seen a number of other issues that seem to cause VMs to fail to start: * Long sequence of "API is not available yet because of exception: The remote RPC to the application server failed for call modules.GetNumInstances()" errors, followed by shutdown after about *Technical details* In the compute engine API logs, I see some create/delete records like the following, showing an instance created then deleted within ~2 minutes: 17:11:15.264 compute.instances.insert type:GCE_API_CALL resource id:"2166106899089084061" 17:11:51.535 type:GCE_OPERATION_DONE (matching the previous call) 17:13:40.427 compute.instances.delete type:GCE_API_CALL resource id:"2166106899089084061" 17:14:30.572 type":GCE_OPERATION_DONE When I search for logs from this instance itself, the last messages across all logs are from syslog, showing it was still running the "startup" process provided by Google when the shutdown signal from the delete operation was executed. It seems from this log it didn't even get to our code; this is killed while running Google's code I think. I see similar issues where instances run for ~5 minutes, and have a sequence of errors related to "api_verifier" returning many errors. Any ideas what is causing this, and what we can do to make it happen less often? Thank you! Apr 20 21:11:50 gae-integration--track--gce-20160415t135311-cyl8 vm_runtime_init: Apr 20 21:11:50 Pulling GAE_FULL_APP_CONTAINER: us.gcr.io/TRUNCATED@sha256:f70b09a8217430bde2e45d6ccc868e3699b92f965eff12cc6ed19865ce131585 Apr 20 21:12:03 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 27.964060] docker0: port 1(veth75e5c89) entered forwarding state Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 72.892834] aufs au_opts_verify:1570:docker[2262]: dirperm1 breaks the protection by the permission bits on the lower branch Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: 'build_app': skipping for gcloud based deployment. Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: 'patch-dockerfile': skipping for empty patch. Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 73.034744] aufs au_opts_verify:1570:docker[2262]: dirperm1 breaks the protection by the permission bits on the lower branch Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: start 'patch_dockerfile'. Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 73.348051] docker0: port 2(veth0dcc920) entered forwarding state Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 73.076657] aufs au_opts_verify:1570:docker[2262]: dirperm1 breaks the protection by the permission bits on the lower branch Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: start 'api_verifier'. Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: Done start 'build_app'. Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 73.354433] docker0: port 2(veth0dcc920) entered forwarding state Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: start 'build_app'. Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 73.094180] device veth0dcc920 entered promiscuous mode Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: Done start 'pull_app'. Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 vm_runtime_init: Apr 20 21:12:48 starting api_verifier Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 vm_runtime_init: Apr 20 21:12:48 checking the status of api_verifier Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 73.106564] IPv6: ADDRCONF(NETDEV_UP): veth0dcc920: link is not ready Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 73.340512] IPv6: ADDRCONF(NETDEV_CHANGE): veth0dcc920: link becomes ready Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 vm_runtime_init: Apr 20 21:12:48 Done pulling app container Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: Done start 'patch_dockerfile'. Apr 20 21:13:03 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 88.380041] docker0: port 2(veth0dcc920) entered forwarding state Apr 20 21:13:42 gae-integration--track--gce-20160415t135311-cyl8 init: Switching to runlevel: 0 Apr 20 21:13:42 gae-integration--track--gce-20160415t135311-cyl8 shutdown[3370]: shutting down for system halt Apr 20 21:13:42 gae-integration--track--gce-20160415t135311-cyl8 shutdown[3352]: shutting down for system halt Apr 20 21:13:43 gae-integration--track--gce-20160415t135311-cyl8 shutdownscript: Running shutdown script /var/run/google.shutdown.script Apr 20 21:13:43 gae-integration--track--gce-20160415t135311-cyl8 iscsid: iscsid shutting down. Apr 20 21:13:43 gae-integration--track--gce-20160415t135311-cyl8 shutdownscript: Finished running shutdown script /var/run/google.shutdown.script Apr 20 21:13:43 gae-integration--track--gce-20160415t135311-cyl8 google: shutdown script found in metadata. --------------------------------------------------------------------- --------------------------Tail of app logs--------------------------- " % Total % Received % Xferd Average Speed Time Time Time Current" --------------------------------------------------------------------- Current app health: 0 "---------------App was unhealthy, grabbing debug logs----------------" App logs not found. " Dload Upload Total Spent Left Speed" Sending SIGUSR1 to fluentd to trigger a log flush. App shutdown handlers done. Container CID file not found. --------------------------------------------------------------------- " 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) couldn't connect to host" 47e7078ed32619023f23f15022d4965381f0abc634ea9e48d173e312beb43b51 Triggering app shutdown handlers. -------------------------VM shutdown initiated------------------------ -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/70dd6af8-5c91-4830-b6f0-0f89ac09ddca%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
