I just redeployed. It been running for the last few hours, and so far I haven't seen this happen again. I'll report back after it has run for a few days. I did notice some slightly unusual and possibly buggy behaviour though:
1. First: During the "set up" process, the vm boot code still tries to health check. It then prints a message about giving up, then sends /_ah/start. Seems to me that if health checking is disabled, it shouldn't do this (logs below). 2. I observed once instance that didn't accept traffic for a long time. It was stuck downloading some docker package (details below). It eventually did continue and start correctly, so this isn't a huge concern. *VM Boot Health Checking* Logs on all machines have the following: Apr 22 15:53:21 gae-integration--track--gce-20160422t105255-pd6s vm_runtime_init: Apr 22 15:53:21 vm_runtime_init: start 'ah_start'. Apr 22 15:53:21 gae-integration--track--gce-20160422t105255-pd6s vm_runtime_init: Apr 22 15:53:21 vm_runtime_init: ah_start: INFO: app container running Apr 22 15:53:21 gae-integration--track--gce-20160422t105255-pd6s vm_runtime_init: Apr 22 15:53:21 vm_runtime_init: ah_start: app not healthy, won't send /_ah/start yet. * ... repeated multiple times, with access.log showing 500 errors fetching /_ah/health * Apr 22 15:55:13 gae-integration--track--gce-20160422t105255-pd6s vm_runtime_init: Apr 22 15:55:13 vm_runtime_init: ah_start: WARNING: never got healthy response from app, but sending /_ah/start query anyway. Apr 22 15:55:15 gae-integration--track--gce-20160422t105255-pd6s vm_runtime_init: Apr 22 15:55:15 vm_runtime_init: Done start 'ah_start'. At this point it starts successfully, and nothing seems to try hitting /_ah/health again in the access.log. *"Stuck" running docker pull for 20 minutes* One instance appeared to fail to accept traffic (CPU utilization was 0%), so I SSHed to it and found it stuck running docker pull. ps axf showed the following process tree (truncated): 2771 ? S 0:00 | \_ /bin/bash /usr/share/google/run-scripts /var/run/google.startup.script startup 2772 ? S 0:00 | \_ /bin/bash /usr/share/google/run-scripts /var/run/google.startup.script startup 2907 ? S 0:00 | \_ /bin/sh /usr/local/bin/gcloud docker pull us.gcr.io/(TRUNCATED) 2915 ? S 0:00 | \_ python -S /usr/local/bin/../share/google/google-cloud-sdk/./lib/googlecloudsdk/gcloud/gcloud.py docker pull us.gcr.io/(TRUNCATED) 2923 ? Sl 0:00 | \_ docker pull us.gcr.io/(TRUNCATED) syslog shows this process was stuck for 18 minutes: Apr 22 15:36:33 gae-integration--track--gce-20160422t105255-pd6s vm_runtime_init: Apr 22 15:36:33 vm_runtime_init: start 'pull_app'. Apr 22 15:36:35 gae-integration--track--gce-20160422t105255-pd6s vm_runtime_init: Apr 22 15:36:35 Pulling GAE_FULL_APP_CONTAINER: us.gcr.io/triggeredmail/appengine/integration-track-gce.20160422t105255@sha256:52dde4c6b3d053419c247afa51d4ec4093392bba5bd7f713639cbc92e561bccf Apr 22 15:45:33 gae-integration--track--gce-20160422t105255-pd6s vm_unlocker: Restarting OpenBSD Secure Shell server: sshd. Apr 22 15:53:09 gae-integration--track--gce-20160422t105255-pd6s vm_runtime_init: Apr 22 15:53:09 Done pulling app container Apr 22 15:53:09 gae-integration--track--gce-20160422t105255-pd6s vm_runtime_init: Apr 22 15:53:09 vm_runtime_init: Done start 'pull_app'. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/33fb6b71-082c-4b8f-97c8-d478908ed6d9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
