I just redeployed. It been running for the last few hours, and so far I 
haven't seen this happen again. I'll report back after it has run for a few 
days. I did notice some slightly unusual and possibly buggy behaviour 
though:

1. First: During the "set up" process, the vm boot code still tries to 
health check. It then prints a message about giving up, then sends 
/_ah/start. Seems to me that if health checking is disabled, it shouldn't 
do this (logs below).

2. I observed once instance that didn't accept traffic for a long time. It 
was stuck downloading some docker package (details below). It eventually 
did continue and start correctly, so this isn't a huge concern.


*VM Boot Health Checking*

Logs on all machines have the following:

Apr 22 15:53:21 gae-integration--track--gce-20160422t105255-pd6s 
vm_runtime_init: Apr 22 15:53:21 vm_runtime_init: start 'ah_start'.

Apr 22 15:53:21 gae-integration--track--gce-20160422t105255-pd6s 
vm_runtime_init: Apr 22 15:53:21 vm_runtime_init: ah_start: INFO: app 
container running

Apr 22 15:53:21 gae-integration--track--gce-20160422t105255-pd6s 
vm_runtime_init: Apr 22 15:53:21 vm_runtime_init: ah_start: app not 
healthy, won't send /_ah/start yet.

*   ... repeated multiple times, with access.log showing 500 errors 
fetching /_ah/health *

Apr 22 15:55:13 gae-integration--track--gce-20160422t105255-pd6s 
vm_runtime_init: Apr 22 15:55:13 vm_runtime_init: ah_start: WARNING: never 
got healthy response  from app, but sending /_ah/start query anyway.

Apr 22 15:55:15 gae-integration--track--gce-20160422t105255-pd6s 
vm_runtime_init: Apr 22 15:55:15 vm_runtime_init: Done start 'ah_start'.


At this point it starts successfully, and nothing seems to try hitting 
/_ah/health again in the access.log.



*"Stuck" running docker pull for 20 minutes*


One instance appeared to fail to accept traffic (CPU utilization was 0%), 
so I SSHed to it and found it stuck running docker pull. ps axf showed the 
following process tree (truncated):


 2771 ?        S      0:00  |   \_ /bin/bash /usr/share/google/run-scripts 
/var/run/google.startup.script startup

 2772 ?        S      0:00  |       \_ /bin/bash 
/usr/share/google/run-scripts /var/run/google.startup.script startup

 2907 ?        S      0:00  |           \_ /bin/sh /usr/local/bin/gcloud 
docker pull us.gcr.io/(TRUNCATED)

 2915 ?        S      0:00  |               \_ python -S 
/usr/local/bin/../share/google/google-cloud-sdk/./lib/googlecloudsdk/gcloud/gcloud.py
 
docker pull us.gcr.io/(TRUNCATED)

 2923 ?        Sl     0:00  |                   \_ docker pull 
us.gcr.io/(TRUNCATED)



syslog shows this process was stuck for 18 minutes:



Apr 22 15:36:33 gae-integration--track--gce-20160422t105255-pd6s 
vm_runtime_init: Apr 22 15:36:33 vm_runtime_init: start 'pull_app'.

Apr 22 15:36:35 gae-integration--track--gce-20160422t105255-pd6s 
vm_runtime_init: Apr 22 15:36:35 Pulling GAE_FULL_APP_CONTAINER: 
us.gcr.io/triggeredmail/appengine/integration-track-gce.20160422t105255@sha256:52dde4c6b3d053419c247afa51d4ec4093392bba5bd7f713639cbc92e561bccf

Apr 22 15:45:33 gae-integration--track--gce-20160422t105255-pd6s 
vm_unlocker: Restarting OpenBSD Secure Shell server: sshd.

Apr 22 15:53:09 gae-integration--track--gce-20160422t105255-pd6s 
vm_runtime_init: Apr 22 15:53:09 Done pulling app container

Apr 22 15:53:09 gae-integration--track--gce-20160422t105255-pd6s 
vm_runtime_init: Apr 22 15:53:09 vm_runtime_init: Done start 'pull_app'.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/33fb6b71-082c-4b8f-97c8-d478908ed6d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to