*TL;DR*: I'm seeing the flexible environment autoscaler start a machine, 
then stop it 2 minutes later, before it has even finished starting the 
application. Is this a misconfiguration on my side or a bug? Could the 
issue be that our application loads too slowly? Can I reconfigure this 
somehow? We see lots of instances "fail to start" like this, and I can't 
figure out why exactly.


*Longer description*

We've been experimenting with moving parts of our application to the new 
"Flexible" environment. So far it works well! However, I've noticed that 
the autoscaler is extremely aggressive at stopping instances, which 
contradicts the documentation. I'm assuming the Flexible environment using 
the GCE autoscaler. The documentation there states "the 10 minute delay 
prevents virtual machine instances from being terminated before the 
instance has been running for 10 minutes. Cool down periods for new 
instances are ignored when deciding whether to scale down a group":

https://cloud.google.com/compute/docs/autoscaler/understanding-autoscaler-decisions

However, I've been looking at when instances are created and stopped, and I 
see many that get stopped after ~2-3 minutes before the machine has even 
booted! It seems to me from the description above, it should let my machine 
run for at least 10 minutes before stopping it. Could this be getting 
killed by some sort of health checking? This module is using the defaults, 
which looks like it should wait at least 5 minutes before declaring an 
instance unhealthy.

I've actually seen a number of other issues that seem to cause VMs to fail 
to start:

* Long sequence of "API is not available yet because of exception: The 
remote RPC to the application server failed for call 
modules.GetNumInstances()" errors, followed by shutdown after about 


*Technical details*

In the compute engine API logs, I see some create/delete records like the 
following, showing an instance created then deleted within ~2 minutes:


17:11:15.264 compute.instances.insert type:GCE_API_CALL resource 
id:"2166106899089084061"
17:11:51.535   type:GCE_OPERATION_DONE (matching the previous call)
17:13:40.427 compute.instances.delete type:GCE_API_CALL resource 
id:"2166106899089084061"
17:14:30.572   type":GCE_OPERATION_DONE


When I search for logs from this instance itself, the last messages across 
all logs are from syslog, showing it was still running the "startup" 
process provided by Google when the shutdown signal from the delete 
operation was executed. It seems from this log it didn't even get to our 
code; this is killed while running Google's code I think. I see similar 
issues where instances run for ~5 minutes, and have a sequence of errors 
related to "api_verifier" returning many errors.

Any ideas what is causing this, and what we can do to make it happen less 
often?

Thank you!




Apr 20 21:11:50 gae-integration--track--gce-20160415t135311-cyl8 
vm_runtime_init: Apr 20 21:11:50 Pulling GAE_FULL_APP_CONTAINER: 
us.gcr.io/TRUNCATED@sha256:f70b09a8217430bde2e45d6ccc868e3699b92f965eff12cc6ed19865ce131585
Apr 20 21:12:03 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
  27.964060] docker0: port 1(veth75e5c89) entered forwarding state
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
  72.892834] aufs au_opts_verify:1570:docker[2262]: dirperm1 breaks the 
protection by the permission bits on the lower branch
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: 'build_app': skipping for 
gcloud based deployment.
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: 'patch-dockerfile': 
skipping for empty patch.
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
  73.034744] aufs au_opts_verify:1570:docker[2262]: dirperm1 breaks the 
protection by the permission bits on the lower branch
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: start 'patch_dockerfile'.
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
  73.348051] docker0: port 2(veth0dcc920) entered forwarding state
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
  73.076657] aufs au_opts_verify:1570:docker[2262]: dirperm1 breaks the 
protection by the permission bits on the lower branch
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: start 'api_verifier'.
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: Done start 'build_app'.
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
  73.354433] docker0: port 2(veth0dcc920) entered forwarding state
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: start 'build_app'.
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
  73.094180] device veth0dcc920 entered promiscuous mode
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: Done start 'pull_app'.
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
vm_runtime_init: Apr 20 21:12:48 starting api_verifier
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
vm_runtime_init: Apr 20 21:12:48 checking the status of api_verifier
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
  73.106564] IPv6: ADDRCONF(NETDEV_UP): veth0dcc920: link is not ready
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
  73.340512] IPv6: ADDRCONF(NETDEV_CHANGE): veth0dcc920: link becomes ready
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
vm_runtime_init: Apr 20 21:12:48 Done pulling app container
Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: Done start 
'patch_dockerfile'.
Apr 20 21:13:03 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
  88.380041] docker0: port 2(veth0dcc920) entered forwarding state
Apr 20 21:13:42 gae-integration--track--gce-20160415t135311-cyl8 init: 
Switching to runlevel: 0
Apr 20 21:13:42 gae-integration--track--gce-20160415t135311-cyl8 
shutdown[3370]: shutting down for system halt
Apr 20 21:13:42 gae-integration--track--gce-20160415t135311-cyl8 
shutdown[3352]: shutting down for system halt
Apr 20 21:13:43 gae-integration--track--gce-20160415t135311-cyl8 
shutdownscript: Running shutdown script /var/run/google.shutdown.script
Apr 20 21:13:43 gae-integration--track--gce-20160415t135311-cyl8 iscsid: 
iscsid shutting down.
Apr 20 21:13:43 gae-integration--track--gce-20160415t135311-cyl8 
shutdownscript: Finished running shutdown script 
/var/run/google.shutdown.script
Apr 20 21:13:43 gae-integration--track--gce-20160415t135311-cyl8 google: 
shutdown script found in metadata.
---------------------------------------------------------------------
--------------------------Tail of app logs---------------------------
"  % Total    % Received % Xferd  Average Speed   Time    Time     Time 
 Current"
---------------------------------------------------------------------
Current app health: 0
"---------------App was unhealthy, grabbing debug logs----------------"
App logs not found.
"                                 Dload  Upload   Total   Spent    Left 
 Speed"
Sending SIGUSR1 to fluentd to trigger a log flush.
App shutdown handlers done.
Container CID file not found.
---------------------------------------------------------------------
"
0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) 
couldn't connect to host"
47e7078ed32619023f23f15022d4965381f0abc634ea9e48d173e312beb43b51
Triggering app shutdown handlers.
-------------------------VM shutdown initiated------------------------





-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/70dd6af8-5c91-4830-b6f0-0f89ac09ddca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to