Neil Conway created MESOS-7333:
----------------------------------
Summary: Clarify log message when agent rate removal limit is
applied
Key: MESOS-7333
URL: https://issues.apache.org/jira/browse/MESOS-7333
Project: Mesos
Issue Type: Bug
Components: master
Reporter: Neil Conway
When the master begins to mark an agent unreachable and the agent removal rate
limit is set, we log:
{noformat}
Scheduling removal of agent 07ae6114-a59a-41d5-a3d5-32e6681eb17d-S2 at
slave(1)@192.168.10.45:5051 (192.168.10.45); did not re-register within 10mins
after disconnecting
{noformat}
This can be improved. The important question for an operator is: _how long will
it take for the agent to be removed?_ If this removal falls below the rate
limit, the agent will be removed immediately; if it does not, the removal might
not happen for a long time. It would be great to distinguish between these two
cases in the log output.
For example: if the rate limit is configured but we're going to remove the
agent immediately anyway, then just log the same output we normally do (skip
"Scheduling..."). Whereas if the rate limit is going to delay removing the
agent, we should (a) make that _clear_ in the output (b) ideally include some
prediction of how long it will take for the agent to be removed. e.g.,
"Scheduling removal of agent ABC; agent removal rate limit of X is in effect,
waiting Y until removing the agent."
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)