[google-appengine] Re: Flexible environment auto-scaling: Instances stopped after ~2 minutes, before app even runs

'Nicholas (Google Cloud Support)' via Google App Engine Thu, 21 Apr 2016 13:04:10 -0700

Thanks for the information.

Could you try setting up health checking? If not, explicitly disabling it 
in app.yaml:
health_check:
  enable_health_check: False


Regarding your docker image size, this can be reduced by using a 
.dockerignore 
<https://docs.docker.com/engine/reference/builder/#dockerignore-file> file. 
Unfortunately, I can't find any official documentation stating explicitly 
that skip_files is ignored with such runtimes but this is the case. Try 
populating a .dockerfile with files and directories you wish to exclude and 
see how it affects your build. It should at least reduce build time.

Hope this helps.

On Wednesday, April 20, 2016 at 6:14:46 PM UTC-4, Evan Jones wrote:
>
> *TL;DR*: I'm seeing the flexible environment autoscaler start a machine, 
> then stop it 2 minutes later, before it has even finished starting the 
> application. Is this a misconfiguration on my side or a bug? Could the 
> issue be that our application loads too slowly? Can I reconfigure this 
> somehow? We see lots of instances "fail to start" like this, and I can't 
> figure out why exactly.
>
>
> *Longer description*
>
> We've been experimenting with moving parts of our application to the new 
> "Flexible" environment. So far it works well! However, I've noticed that 
> the autoscaler is extremely aggressive at stopping instances, which 
> contradicts the documentation. I'm assuming the Flexible environment using 
> the GCE autoscaler. The documentation there states "the 10 minute delay 
> prevents virtual machine instances from being terminated before the 
> instance has been running for 10 minutes. Cool down periods for new 
> instances are ignored when deciding whether to scale down a group":
>
>
> https://cloud.google.com/compute/docs/autoscaler/understanding-autoscaler-decisions
>
> However, I've been looking at when instances are created and stopped, and 
> I see many that get stopped after ~2-3 minutes before the machine has even 
> booted! It seems to me from the description above, it should let my machine 
> run for at least 10 minutes before stopping it. Could this be getting 
> killed by some sort of health checking? This module is using the defaults, 
> which looks like it should wait at least 5 minutes before declaring an 
> instance unhealthy.
>
> I've actually seen a number of other issues that seem to cause VMs to fail 
> to start:
>
> * Long sequence of "API is not available yet because of exception: The 
> remote RPC to the application server failed for call 
> modules.GetNumInstances()" errors, followed by shutdown after about 
>
>
> *Technical details*
>
> In the compute engine API logs, I see some create/delete records like the 
> following, showing an instance created then deleted within ~2 minutes:
>
>
> 17:11:15.264 compute.instances.insert type:GCE_API_CALL resource 
> id:"2166106899089084061"
> 17:11:51.535   type:GCE_OPERATION_DONE (matching the previous call)
> 17:13:40.427 compute.instances.delete type:GCE_API_CALL resource 
> id:"2166106899089084061"
> 17:14:30.572   type":GCE_OPERATION_DONE
>
>
> When I search for logs from this instance itself, the last messages across 
> all logs are from syslog, showing it was still running the "startup" 
> process provided by Google when the shutdown signal from the delete 
> operation was executed. It seems from this log it didn't even get to our 
> code; this is killed while running Google's code I think. I see similar 
> issues where instances run for ~5 minutes, and have a sequence of errors 
> related to "api_verifier" returning many errors.
>
> Any ideas what is causing this, and what we can do to make it happen less 
> often?
>
> Thank you!
>
>
>
>
> Apr 20 21:11:50 gae-integration--track--gce-20160415t135311-cyl8 
> vm_runtime_init: Apr 20 21:11:50 Pulling GAE_FULL_APP_CONTAINER: 
> us.gcr.io/TRUNCATED@sha256:f70b09a8217430bde2e45d6ccc868e3699b92f965eff12cc6ed19865ce131585
> Apr 20 21:12:03 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
>   27.964060] docker0: port 1(veth75e5c89) entered forwarding state
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
>   72.892834] aufs au_opts_verify:1570:docker[2262]: dirperm1 breaks the 
> protection by the permission bits on the lower branch
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
> vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: 'build_app': skipping for 
> gcloud based deployment.
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
> vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: 'patch-dockerfile': 
> skipping for empty patch.
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
>   73.034744] aufs au_opts_verify:1570:docker[2262]: dirperm1 breaks the 
> protection by the permission bits on the lower branch
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
> vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: start 'patch_dockerfile'.
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
>   73.348051] docker0: port 2(veth0dcc920) entered forwarding state
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
>   73.076657] aufs au_opts_verify:1570:docker[2262]: dirperm1 breaks the 
> protection by the permission bits on the lower branch
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
> vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: start 'api_verifier'.
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
> vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: Done start 'build_app'.
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
>   73.354433] docker0: port 2(veth0dcc920) entered forwarding state
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
> vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: start 'build_app'.
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
>   73.094180] device veth0dcc920 entered promiscuous mode
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
> vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: Done start 'pull_app'.
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
> vm_runtime_init: Apr 20 21:12:48 starting api_verifier
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
> vm_runtime_init: Apr 20 21:12:48 checking the status of api_verifier
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
>   73.106564] IPv6: ADDRCONF(NETDEV_UP): veth0dcc920: link is not ready
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
>   73.340512] IPv6: ADDRCONF(NETDEV_CHANGE): veth0dcc920: link becomes ready
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
> vm_runtime_init: Apr 20 21:12:48 Done pulling app container
> Apr 20 21:12:48 gae-integration--track--gce-20160415t135311-cyl8 
> vm_runtime_init: Apr 20 21:12:48 vm_runtime_init: Done start 
> 'patch_dockerfile'.
> Apr 20 21:13:03 gae-integration--track--gce-20160415t135311-cyl8 kernel: [ 
>   88.380041] docker0: port 2(veth0dcc920) entered forwarding state
> Apr 20 21:13:42 gae-integration--track--gce-20160415t135311-cyl8 init: 
> Switching to runlevel: 0
> Apr 20 21:13:42 gae-integration--track--gce-20160415t135311-cyl8 
> shutdown[3370]: shutting down for system halt
> Apr 20 21:13:42 gae-integration--track--gce-20160415t135311-cyl8 
> shutdown[3352]: shutting down for system halt
> Apr 20 21:13:43 gae-integration--track--gce-20160415t135311-cyl8 
> shutdownscript: Running shutdown script /var/run/google.shutdown.script
> Apr 20 21:13:43 gae-integration--track--gce-20160415t135311-cyl8 iscsid: 
> iscsid shutting down.
> Apr 20 21:13:43 gae-integration--track--gce-20160415t135311-cyl8 
> shutdownscript: Finished running shutdown script 
> /var/run/google.shutdown.script
> Apr 20 21:13:43 gae-integration--track--gce-20160415t135311-cyl8 google: 
> shutdown script found in metadata.
> ---------------------------------------------------------------------
> --------------------------Tail of app logs---------------------------
> "  % Total    % Received % Xferd  Average Speed   Time    Time     Time 
>  Current"
> ---------------------------------------------------------------------
> Current app health: 0
> "---------------App was unhealthy, grabbing debug logs----------------"
> App logs not found.
> "                                 Dload  Upload   Total   Spent    Left 
>  Speed"
> Sending SIGUSR1 to fluentd to trigger a log flush.
> App shutdown handlers done.
> Container CID file not found.
> ---------------------------------------------------------------------
> "
> 0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) 
> couldn't connect to host"
> 47e7078ed32619023f23f15022d4965381f0abc634ea9e48d173e312beb43b51
> Triggering app shutdown handlers.
> -------------------------VM shutdown initiated------------------------
>
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/f0ad7afe-5b29-4664-988d-f24f79bd5278%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[google-appengine] Re: Flexible environment auto-scaling: Instances stopped after ~2 minutes, before app even runs

Reply via email to