Hi George,
I was able to do a test deployment tonight by defining the new health
checks as you recommended.
Before I continue - my test for this based on performing a deployment as we
see the exact same behaviour there with the VM starting up and crashing as
we did with the incident on Monday.
The good news is this has seemingly completely resolved our deployment
issue - in that they are once again successful in a reasonable amount of
time, rather than timing out and failing because of the aforementioned. So
at this point I'm semi confident that it has also resolved the issue we
experienced on Monday when the VM was restarted and couldn't start up
again. Difficult to prove this one currently from my side.
I replaced the legacy health_check block in our .yaml file with the
following:-
liveness_check:
path: "/_ah/health"
initial_delay_sec: 300
check_interval_sec: 5
timeout_sec: 5
failure_threshold: 3
success_threshold: 1
readiness_check:
path: "/login"
app_start_timeout_sec: 300
check_interval_sec: 30
timeout_sec: 5
failure_threshold: 3
success_threshold: 1
The most obvious question I have at this point, is why? Why would this
resolve it the issue? I can only guess that this could be related to the
new style health/liveness checks being enabled by default but we had not
executed:
gcloud beta app update --split-health-checks --project [YOUR_PROJECT_ID]
or provided the liveness_check/readiness_check blocks in our yaml file?
I've only just learnt about these new updated health checks here
<https://cloud.google.com/appengine/docs/flexible/php/configuring-your-app-with-app-yaml#configuring_supervisord_in_the_php_runtime>
as
it's not something we keep up-to-date with once we have a desired
configuration so am concerned that there was a backwards compatibility
issue here.
I'm performing a couple more deployments to satisfy myself that this is not
a fluke.
As a side question I see these entries in our logs now since activating the
new health checks:
<https://lh3.googleusercontent.com/-yUuumRDccTA/WpcdBRBtY7I/AAAAAAAAABI/I_2N-EwYdWIzv6RJKaLcnxRJdyVoQs7IACLcBGAs/s1600/readiness_check.png>
<https://lh3.googleusercontent.com/-1q52sNXnYFI/WpcdFCILrbI/AAAAAAAAABM/NZrQfaazq0cwAMCYbvujUfE1l9J6iZynwCLcBGAs/s1600/liveness_check.png>
These don't seem to be obeying the configuration I had defined (as per
above code snippets). Most notably the path and interval?
I'd like to learn if I'm doing anything wrong here or if there is an
explanation.
Many thanks again and looking forward to hearing from you.
Karl
On Wednesday, February 28, 2018 at 8:12:09 PM UTC, Karl Tinawi wrote:
>
> Hi George,
>
> Yes that's correct - it's happened once outside of deployments.
>
> To answer your questions sir:
>
> - We require a custom PHP installation in order to make use of modules
> that are missing from Google's offering. I've not checked the latest list
> of extension but it may be that we may be able to move back to using the
> standard PHP image so I'll check this for sure.
> - Scaling is another challenge that we're looking and we're certainly
> aware that we need to move to auto scaling for contingency etc...
> - I'll test configuring the readiness check and report back if we
> notice any difference in behaviour.
>
> Were the logs helpful? I'd be grateful if you could shed some light on the
> investigation your end. This is the first time we've noticed an issue such
> as this during the maintenance process, which should be innocuous and
> invisible to us.
>
> At this point I'm unsure if the issues we face during deployments are
> related to the incident that happened with our running app, which continue
> to occur daily. It's worth noting that the behaviour of the VM is identical
> (in the way of the abrupt restarts as it's trying to boot). I may look at
> trying a test deployment using another image and seeing if that helps.
>
>
> Many thanks again,
>
> Karl
>
>
> On Wednesday, February 28, 2018 at 12:52:06 AM UTC, George (Cloud Platform
> Support) wrote:
>>
>> Hello Karl,
>>
>> You seem to indicate that the outage is a one-time event, and that there
>> is no other similar occurrence as yet. If this is so, to prevent similar
>> unwanted events in future, you may configure your app for health checks, in
>> detail. For reference, the "Configuring your App with app.yaml" should
>> prove of great help. In your app.yaml, you can specify either liveness
>> check (choosing appropriate parameter values):
>>
>> liveness_check:
>> path: "/liveness_check"
>> check_interval_sec: 30
>> timeout_sec: 4
>> failure_threshold: 2
>> success_threshold: 2
>>
>> or readiness check:
>>
>> readiness_check:
>> path: "/readiness_check"
>> check_interval_sec: 5
>> timeout_sec: 4
>> failure_threshold: 2
>> success_threshold: 2
>> app_start_timeout_sec: 300
>>
>> It is worthwhile noting that the usual way of specifying PHP for you app
>> is:
>>
>> runtime: php //This setting is required. It is the name of the App
>> Engine language runtime used by this application. To specify PHP, use php
>> env: flex
>>
>> You app uses: runtime: custom , by contrast.
>>
>> You may also switch to automatic scaling from manual, and one only
>> instance. If this makes a difference in your app's behavior, the
>> information would help us with debugging.
>>
>>
>>
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit
https://groups.google.com/d/msgid/google-appengine/fe551f25-65ef-4eef-a1fd-7e53e44f0f06%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.