Repository: aurora Updated Branches: refs/heads/master c934a1258 -> b3f100966
Refactoring HealthCheckConfig into separate structs Bugs closed: AURORA-1562 Reviewed at https://reviews.apache.org/r/41428/ Project: http://git-wip-us.apache.org/repos/asf/aurora/repo Commit: http://git-wip-us.apache.org/repos/asf/aurora/commit/b3f10096 Tree: http://git-wip-us.apache.org/repos/asf/aurora/tree/b3f10096 Diff: http://git-wip-us.apache.org/repos/asf/aurora/diff/b3f10096 Branch: refs/heads/master Commit: b3f1009668cd3c95434d2d195f89e1041314286e Parents: c934a12 Author: Dmitriy Shirchenko <[email protected]> Authored: Thu Dec 17 12:33:55 2015 -0800 Committer: Bill Farner <[email protected]> Committed: Thu Dec 17 12:33:55 2015 -0800 ---------------------------------------------------------------------- docs/configuration-reference.md | 35 ++++++++++--- src/main/python/apache/aurora/client/config.py | 48 ++++++++---------- .../python/apache/aurora/config/schema/base.py | 21 +++++++- .../aurora/executor/common/health_checker.py | 34 ++++++++----- .../python/apache/aurora/client/test_config.py | 53 ++++++++++++++------ .../executor/common/test_health_checker.py | 53 ++++++++++++++++++-- .../http/http_example_bad_healthcheck.aurora | 6 ++- 7 files changed, 180 insertions(+), 70 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/aurora/blob/b3f10096/docs/configuration-reference.md ---------------------------------------------------------------------- diff --git a/docs/configuration-reference.md b/docs/configuration-reference.md index 07149c7..ba486d8 100644 --- a/docs/configuration-reference.md +++ b/docs/configuration-reference.md @@ -333,7 +333,7 @@ Job Schema ```max_task_failures``` | Integer | Maximum number of failures after which the task is considered to have failed (Default: 1) Set to -1 to allow for infinite failures ```priority``` | Integer | Preemption priority to give the task (Default 0). Tasks with higher priorities may preempt tasks at lower priorities. ```production``` | Boolean | Whether or not this is a production task that may [preempt](resources.md#task-preemption) other tasks (Default: False). Production job role must have the appropriate [quota](resources.md#resource-quota). - ```health_check_config``` | ```HealthCheckConfig``` object | Parameters for controlling a task's health checks via HTTP. Only used if a health port was assigned with a command line wildcard. + ```health_check_config``` | ```HealthCheckConfig``` object | Parameters for controlling a task's health checks. HTTP health check is only used if a health port was assigned with a command line wildcard. ```container``` | ```Container``` object | An optional container to run all processes inside of. ```lifecycle``` | ```LifecycleConfig``` object | An optional task lifecycle configuration that dictates commands to be executed on startup/teardown. HTTP lifecycle is enabled by default if the "health" port is requested. See [LifecycleConfig Objects](#lifecycleconfig-objects) for more information. ```tier``` | String | Task tier type. When set to `revocable` requires the task to run with Mesos revocable resources. This is work [in progress](https://issues.apache.org/jira/browse/AURORA-1343) and is currently only supported for the revocable tasks. The ultimate goal is to simplify task configuration by hiding various configuration knobs behind a task tier definition. See AURORA-1343 and AURORA-1443 for more details. @@ -380,20 +380,41 @@ Parameters for controlling the rate and policy of rolling updates. ### HealthCheckConfig Objects +*Note: ```endpoint```, ```expected_response``` and ```expected_response_code``` are deprecated from ```HealthCheckConfig``` and must be definied in ```HttpHealthChecker```.* + Parameters for controlling a task's health checks via HTTP or a shell command. -| object | type | description +| param | type | description | ------- | :-------: | -------- -| ```endpoint``` | String | HTTP endpoint to check (Default: /health) -| ```expected_response``` | String | If not empty, fail the HTTP health check if the response differs. Case insensitive. (Default: ok) -| ```expected_response_code``` | Integer | If not zero, fail the HTTP health check if the response code differs. (Default: 0) +| *```endpoint```* | String | HTTP endpoint to check (Default: /health) **Deprecated.** +| *```expected_response```* | String | If not empty, fail the HTTP health check if the response differs. Case insensitive. (Default: ok) **Deprecated.** +| *```expected_response_code```* | Integer | If not zero, fail the HTTP health check if the response code differs. (Default: 0) **Deprecated.** +| ```health_checker``` | HealthCheckerConfig | Configure what kind of health check to use. | ```initial_interval_secs``` | Integer | Initial delay for performing a health check. (Default: 15) | ```interval_secs``` | Integer | Interval on which to check the task's health. (Default: 10) | ```max_consecutive_failures``` | Integer | Maximum number of consecutive failures that will be tolerated before considering a task unhealthy (Default: 0) -| ```shell_command``` | String | An alternative to HTTP health checking. Specifies a shell command that will be executed. Any non-zero exit status will be interpreted as a health check failure. -| ```type``` | String | 'http' or 'shell'. (Default: 'http') | ```timeout_secs``` | Integer | HTTP request timeout. (Default: 1) +### HealthCheckerConfig Objects +| param | type | description +| ------- | :-------: | -------- +| ```http``` | HttpHealthChecker | Configure health check to use HTTP. (Default) +| ```shell``` | ShellHealthChecker | Configure health check via a shell command. + + +### HttpHealthChecker Objects +| param | type | description +| ------- | :-------: | -------- +| ```endpoint``` | String | HTTP endpoint to check (Default: /health) +| ```expected_response``` | String | If not empty, fail the HTTP health check if the response differs. Case insensitive. (Default: ok) +| ```expected_response_code``` | Integer | If not zero, fail the HTTP health check if the response code differs. (Default: 0) + +### ShellHealthChecker Objects +| param | type | description +| ------- | :-------: | -------- +| ```shell_command``` | String | An alternative to HTTP health checking. Specifies a shell command that will be executed. Any non-zero exit status will be interpreted as a health check failure. + + ### Announcer Objects If the `announce` field in the Job configuration is set, each task will be http://git-wip-us.apache.org/repos/asf/aurora/blob/b3f10096/src/main/python/apache/aurora/client/config.py ---------------------------------------------------------------------- diff --git a/src/main/python/apache/aurora/client/config.py b/src/main/python/apache/aurora/client/config.py index 161c362..a4fa485 100644 --- a/src/main/python/apache/aurora/client/config.py +++ b/src/main/python/apache/aurora/client/config.py @@ -22,7 +22,7 @@ import math import re import sys -from pystachio.composite import Empty +from twitter.common import log from apache.aurora.client import binding_helper from apache.aurora.client.base import die @@ -70,22 +70,18 @@ def _validate_environment_name(config): __validate_env(env_name, 'Environment') -CANNOT_HAVE_HTTP_ARGS_WITH_SHELL_ERROR = ''' -shell_command does not support supplied http arguments. -''' - -CANNOT_HAVE_SHELL_ARGS_WITH_HTTP_ERROR = ''' -Cannot define shell_commmand for HTTP health check. -''' - -INVALID_HEALTH_CHECK_TYPE = ''' -Invalid health check type {health_check_type}. +INVALID_HEALTH_CHECKER = ''' +Invalid health endpoint config. ''' MUST_PROVIDE_SHELL_COMMAND_ERROR = ''' -Must provide a shell command for shell type. +Must provide a shell command if using ShellHealthChecker. ''' +HTTP_DEPRECATION_WARNING = ''' +WARNING: endpoint, expected_response, and expected_response_code are deprecated and will be removed +in the next release. Please consult updated documentation. +''' HTTP_HEALTH_CHECK = 'http' SHELL_HEALTH_CHECK = 'shell' @@ -93,22 +89,22 @@ SHELL_HEALTH_CHECK = 'shell' # TODO (AURORA-1552): Add config validation to the executor def _validate_health_check_config(config): - health_check_config = config.health_check_config() - health_check_type = health_check_config.type().get() - - # Make sure we either have HTTP or SHELL. - if health_check_type not in {HTTP_HEALTH_CHECK, SHELL_HEALTH_CHECK}: - die(INVALID_HEALTH_CHECK_TYPE.format(health_check_type=health_check_type)) - if health_check_type == SHELL_HEALTH_CHECK: - # SHELL options - shell_command = health_check_config.shell_command() - if shell_command == Empty: + health_check_config = config.health_check_config().get() + health_checker = health_check_config.get('health_checker', {}) + # If we have old-style of configuring. + # TODO (AURORA-1563): Remove this code after we drop support for defining these directly in + # HealthCheckConfig. + for deprecated in {'endpoint', 'expected_response', 'expected_response_code'}: + if deprecated in health_check_config: + log.warn(HTTP_DEPRECATION_WARNING) + break + if SHELL_HEALTH_CHECK in health_checker: + # Make sure we specified a shell_command if we chose a shell config. + shell_health_checker = health_checker.get(SHELL_HEALTH_CHECK, {}) + shell_command = shell_health_checker.get('shell_command') + if not shell_command: # Must define a command. die(MUST_PROVIDE_SHELL_COMMAND_ERROR) - elif health_check_type == HTTP_HEALTH_CHECK: - if health_check_config.shell_command() != Empty: - # No shell_command for HTTP. - die(CANNOT_HAVE_SHELL_ARGS_WITH_HTTP_ERROR) UPDATE_CONFIG_MAX_FAILURES_ERROR = ''' http://git-wip-us.apache.org/repos/asf/aurora/blob/b3f10096/src/main/python/apache/aurora/config/schema/base.py ---------------------------------------------------------------------- diff --git a/src/main/python/apache/aurora/config/schema/base.py b/src/main/python/apache/aurora/config/schema/base.py index e752482..69182c3 100644 --- a/src/main/python/apache/aurora/config/schema/base.py +++ b/src/main/python/apache/aurora/config/schema/base.py @@ -36,15 +36,32 @@ class UpdateConfig(Struct): pulse_interval_secs = Integer +class HttpHealthChecker(Struct): + endpoint = Default(String, '/health') + expected_response = Default(String, 'ok') + expected_response_code = Default(Integer, 0) + + +class ShellHealthChecker(Struct): + shell_command = String + + +class HealthCheckerConfig(Struct): + http = HttpHealthChecker + shell = ShellHealthChecker + + +DefaultHealthChecker = HealthCheckerConfig(http=HttpHealthChecker()) + + class HealthCheckConfig(Struct): endpoint = Default(String, '/health') expected_response = Default(String, 'ok') expected_response_code = Default(Integer, 0) + health_checker = Default(HealthCheckerConfig, DefaultHealthChecker) initial_interval_secs = Default(Float, 15.0) interval_secs = Default(Float, 10.0) max_consecutive_failures = Default(Integer, 0) - shell_command = String - type = Default(String, 'http') timeout_secs = Default(Float, 1.0) http://git-wip-us.apache.org/repos/asf/aurora/blob/b3f10096/src/main/python/apache/aurora/executor/common/health_checker.py ---------------------------------------------------------------------- diff --git a/src/main/python/apache/aurora/executor/common/health_checker.py b/src/main/python/apache/aurora/executor/common/health_checker.py index cba4e8c..01194aa 100644 --- a/src/main/python/apache/aurora/executor/common/health_checker.py +++ b/src/main/python/apache/aurora/executor/common/health_checker.py @@ -210,31 +210,39 @@ class HealthCheckerProvider(StatusCheckerProvider): :return: Instance of a HealthChecker. """ mesos_task = mesos_task_instance_from_assigned_task(assigned_task) - portmap = resolve_ports(mesos_task, assigned_task.assignedPorts) - health_check_config = mesos_task.health_check_config().get() - health_check_type = health_check_config.get('type') - - # We don't need a port if we are running a shell command. - if health_check_type == HTTP_HEALTH_CHECK and 'health' not in portmap: - return None + health_checker = health_check_config.get('health_checker', {}) timeout_secs = health_check_config.get('timeout_secs') - - if health_check_type == SHELL_HEALTH_CHECK: - shell_command = health_check_config.get('shell_command') + if SHELL_HEALTH_CHECK in health_checker: + shell_command = health_checker.get(SHELL_HEALTH_CHECK, {}).get('shell_command') shell_signaler = ShellHealthCheck( cmd=shell_command, timeout_secs=timeout_secs ) a_health_checker = lambda: shell_signaler() else: + portmap = resolve_ports(mesos_task, assigned_task.assignedPorts) + if 'health' not in portmap: + return None + if HTTP_HEALTH_CHECK in health_checker: + # Assume user has already switched over to the new config since we found the key. + http_config = health_checker.get(HTTP_HEALTH_CHECK, {}) + http_endpoint = http_config.get('endpoint') + http_expected_response = http_config.get('expected_response') + http_expected_response_code = http_config.get('expected_response_code') + else: + # TODO (AURORA-1563): Remove this clause after we deprecate support for following keys + # directly in HealthCheckConfig + http_endpoint = health_check_config.get('endpoint') + http_expected_response = health_check_config.get('expected_response') + http_expected_response_code = health_check_config.get('expected_response_code') http_signaler = HttpSignaler( portmap['health'], timeout_secs=timeout_secs) a_health_checker = lambda: http_signaler( - endpoint=health_check_config.get('endpoint'), - expected_response=health_check_config.get('expected_response'), - expected_response_code=health_check_config.get('expected_response_code') + endpoint=http_endpoint, + expected_response=http_expected_response, + expected_response_code=http_expected_response_code ) health_checker = HealthChecker( http://git-wip-us.apache.org/repos/asf/aurora/blob/b3f10096/src/test/python/apache/aurora/client/test_config.py ---------------------------------------------------------------------- diff --git a/src/test/python/apache/aurora/client/test_config.py b/src/test/python/apache/aurora/client/test_config.py index 8fd112f..de0973d 100644 --- a/src/test/python/apache/aurora/client/test_config.py +++ b/src/test/python/apache/aurora/client/test_config.py @@ -16,18 +16,22 @@ import os from io import BytesIO import pytest +from twitter.common import log from twitter.common.contextutil import temporary_dir from apache.aurora.client import config from apache.aurora.client.config import get_config as get_aurora_config +from apache.aurora.client.config import HTTP_DEPRECATION_WARNING from apache.aurora.config import AuroraConfig from apache.aurora.config.loader import AuroraConfigLoader from apache.aurora.config.schema.base import ( MB, Announcer, HealthCheckConfig, + HealthCheckerConfig, Job, Resources, + ShellHealthChecker, Task, UpdateConfig ) @@ -185,7 +189,6 @@ def test_health_check_config_http_ok(): name='hello_bond', role='james', cluster='marine-cluster', health_check_config=HealthCheckConfig( max_consecutive_failures=1, - type='http', ), task=Task(name='main', processes=[], resources=Resources(cpu=0.1, ram=64 * MB, disk=64 * MB))) @@ -195,10 +198,12 @@ def test_health_check_config_http_ok(): def test_health_check_config_shell_ok(): base_job = Job( name='hello_bond', role='james', cluster='marine-cluster', + health_check_config=HealthCheckConfig( max_consecutive_failures=1, - type='shell', - shell_command='foo bar' + health_checker=HealthCheckerConfig( + shell=ShellHealthChecker(shell_command='foo bar') + ) ), task=Task(name='main', processes=[], resources=Resources(cpu=0.1, ram=64 * MB, disk=64 * MB))) @@ -206,38 +211,42 @@ def test_health_check_config_shell_ok(): def test_health_check_config_invalid_type(): - base_job = Job( - name='hello_bond', role='james', cluster='marine-cluster', - health_check_config=HealthCheckConfig( + # Must be 'shell' or 'http' type of config. + with pytest.raises(ValueError): + Job( + name='hello_bond', role='james', cluster='marine-cluster', + health_check_config=HealthCheckConfig( max_consecutive_failures=1, - type='foo', + health_checker='foo', ), task=Task(name='main', processes=[], resources=Resources(cpu=0.1, ram=64 * MB, disk=64 * MB))) - with pytest.raises(SystemExit): - config._validate_health_check_config(AuroraConfig(base_job)) -def test_health_check_config_http_and_shell_defined(): +def test_health_check_config_deprecate_message(monkeypatch): base_job = Job( name='hello_bond', role='james', cluster='marine-cluster', health_check_config=HealthCheckConfig( max_consecutive_failures=1, - type='http', - shell_command='foo bar' + endpoint='/to_be_deprecated' ), task=Task(name='main', processes=[], resources=Resources(cpu=0.1, ram=64 * MB, disk=64 * MB))) - with pytest.raises(SystemExit): - config._validate_health_check_config(AuroraConfig(base_job)) + log_items = [] + def capture_log(msg): + log_items.append(msg) + monkeypatch.setattr(log, 'warn', capture_log) + config._validate_health_check_config(AuroraConfig(base_job)) + assert log_items == [HTTP_DEPRECATION_WARNING] def test_health_check_config_shell_no_command(): + # If we chose shell config, we must define shell_command. base_job = Job( name='hello_bond', role='james', cluster='marine-cluster', health_check_config=HealthCheckConfig( max_consecutive_failures=1, - type='shell', + health_checker=HealthCheckerConfig(shell=ShellHealthChecker()) ), task=Task(name='main', processes=[], resources=Resources(cpu=0.1, ram=64 * MB, disk=64 * MB))) @@ -245,6 +254,20 @@ def test_health_check_config_shell_no_command(): config._validate_health_check_config(AuroraConfig(base_job)) +def test_health_check_config_shell_empty_command(): + # shell_command must not be left empty. + base_job = Job( + name='hello_bond', role='james', cluster='marine-cluster', + health_check_config=HealthCheckConfig( + max_consecutive_failures=1, + health_checker=HealthCheckerConfig(shell=ShellHealthChecker(shell_command='')) + ), + task=Task(name='main', processes=[], + resources=Resources(cpu=0.1, ram=64 * MB, disk=64 * MB))) + with pytest.raises(SystemExit): + config._validate_health_check_config(AuroraConfig(base_job)) + + def test_update_config_passes_with_default_values(): base_job = Job( name='hello_world', role='john_doe', cluster='test-cluster', http://git-wip-us.apache.org/repos/asf/aurora/blob/b3f10096/src/test/python/apache/aurora/executor/common/test_health_checker.py ---------------------------------------------------------------------- diff --git a/src/test/python/apache/aurora/executor/common/test_health_checker.py b/src/test/python/apache/aurora/executor/common/test_health_checker.py index 8561abc..9bebce8 100644 --- a/src/test/python/apache/aurora/executor/common/test_health_checker.py +++ b/src/test/python/apache/aurora/executor/common/test_health_checker.py @@ -12,6 +12,7 @@ # limitations under the License. # +import json import os.path import threading import time @@ -23,7 +24,12 @@ from twitter.common.exceptions import ExceptionalThread from twitter.common.testing.clock import ThreadedClock from apache.aurora.common.health_check.http_signaler import HttpSignaler -from apache.aurora.config.schema.base import HealthCheckConfig +from apache.aurora.config.schema.base import ( + HealthCheckConfig, + HealthCheckerConfig, + HttpHealthChecker, + ShellHealthChecker +) from apache.aurora.executor.common.health_checker import ( HealthChecker, HealthCheckerProvider, @@ -206,28 +212,65 @@ class TestHealthCheckerProvider(unittest.TestCase): hct_max_fail = health_checker.threaded_health_checker.max_consecutive_failures assert hct_max_fail == max_consecutive_failures - def test_from_assigned_task_generic(self): + def test_from_assigned_task_http_endpoint_style_config(self): + interval_secs = 17 + initial_interval_secs = 3 + max_consecutive_failures = 2 + http_config = HttpHealthChecker( + endpoint='/foo', + expected_response='bar', + expected_response_code=201 + ) + task_config = TaskConfig( + executorConfig=ExecutorConfig( + name='thermos', + data=MESOS_JOB( + task=HELLO_WORLD, + health_check_config=HealthCheckConfig( + health_checker=HealthCheckerConfig(http=http_config), + interval_secs=interval_secs, + initial_interval_secs=initial_interval_secs, + max_consecutive_failures=max_consecutive_failures, + timeout_secs=7 + ) + ).json_dumps() + ) + ) + assigned_task = AssignedTask(task=task_config, instanceId=1, assignedPorts={'health': 9001}) + execconfig_data = json.loads(assigned_task.task.executorConfig.data) + http_exec_config = execconfig_data['health_check_config']['health_checker']['http'] + assert http_exec_config['endpoint'] == '/foo' + assert http_exec_config['expected_response'] == 'bar' + assert http_exec_config['expected_response_code'] == 201 + health_checker = HealthCheckerProvider().from_assigned_task(assigned_task, None) + assert health_checker.threaded_health_checker.interval == interval_secs + assert health_checker.threaded_health_checker.initial_interval == initial_interval_secs + + def test_from_assigned_task_shell(self): interval_secs = 17 initial_interval_secs = 3 max_consecutive_failures = 2 timeout_secs = 5 + shell_config = ShellHealthChecker(shell_command='failed command') task_config = TaskConfig( executorConfig=ExecutorConfig( name='thermos-generic', data=MESOS_JOB( task=HELLO_WORLD, health_check_config=HealthCheckConfig( + health_checker=HealthCheckerConfig(shell=shell_config), interval_secs=interval_secs, initial_interval_secs=initial_interval_secs, max_consecutive_failures=max_consecutive_failures, timeout_secs=timeout_secs, - type='shell', - shell_command='failed command' ) ).json_dumps() ) ) - assigned_task = AssignedTask(task=task_config, instanceId=1, assignedPorts={'health': 9001}) + assigned_task = AssignedTask(task=task_config, instanceId=1) + execconfig_data = json.loads(assigned_task.task.executorConfig.data) + assert execconfig_data[ + 'health_check_config']['health_checker']['shell']['shell_command'] == 'failed command' health_checker = HealthCheckerProvider().from_assigned_task(assigned_task, None) assert health_checker.threaded_health_checker.interval == interval_secs assert health_checker.threaded_health_checker.initial_interval == initial_interval_secs http://git-wip-us.apache.org/repos/asf/aurora/blob/b3f10096/src/test/sh/org/apache/aurora/e2e/http/http_example_bad_healthcheck.aurora ---------------------------------------------------------------------- diff --git a/src/test/sh/org/apache/aurora/e2e/http/http_example_bad_healthcheck.aurora b/src/test/sh/org/apache/aurora/e2e/http/http_example_bad_healthcheck.aurora index 37f2e9c..c709b03 100644 --- a/src/test/sh/org/apache/aurora/e2e/http/http_example_bad_healthcheck.aurora +++ b/src/test/sh/org/apache/aurora/e2e/http/http_example_bad_healthcheck.aurora @@ -33,11 +33,13 @@ test_task = Task( update_config = UpdateConfig(watch_secs=10, batch_size=2) # "I am going to fail" config. +shell_config = ShellHealthChecker( + shell_command='grep foo' + ) health_check_config = HealthCheckConfig( + health_checker=HealthCheckerConfig(shell=shell_config), initial_interval_secs=5, interval_secs=1, - type='shell', - shell_command='grep foo' ) job = Service(
