Kevin Sweeney created AURORA-224:
------------------------------------
Summary: Make health checking more configurable in updater
Key: AURORA-224
URL: https://issues.apache.org/jira/browse/AURORA-224
Project: Aurora
Issue Type: Story
Components: Client
Reporter: Kevin Sweeney
Right now the updater considers an instance that passed its health check once
but later fails as unconditionally failed [1] and restarts it. During startup a
service could conceivably respond affirmatively to /health and then later
timeout its requests. Consider making the behavior of the HTTP health checker
more configurable during updates.
[1]
https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/api/instance_watcher.py#L91
{code}
def maybe_set_instance_unhealthy(instance_id, retriable):
# An instance that was previously healthy and currently unhealthy has
failed.
if instance_id in instance_states:
log.info('Instance %s is unhealthy' % instance_id)
instance_states[instance_id].set_healthy(False)
# If the restart threshold has expired or if the instance cannot be
retried it is unhealthy.
elif now > expected_healthy_by or not retriable:
log.info('Instance %s was not reported healthy within %d seconds' % (
instance_id, self._restart_threshold))
instance_states[instance_id] = Instance(finished=True)
{code}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)