Neil Conway created MESOS-7333:
----------------------------------

             Summary: Clarify log message when agent rate removal limit is 
applied
                 Key: MESOS-7333
                 URL: https://issues.apache.org/jira/browse/MESOS-7333
             Project: Mesos
          Issue Type: Bug
          Components: master
            Reporter: Neil Conway


When the master begins to mark an agent unreachable and the agent removal rate 
limit is set, we log:

{noformat}
Scheduling removal of agent 07ae6114-a59a-41d5-a3d5-32e6681eb17d-S2 at 
slave(1)@192.168.10.45:5051 (192.168.10.45); did not re-register within 10mins 
after disconnecting
{noformat}

This can be improved. The important question for an operator is: _how long will 
it take for the agent to be removed?_ If this removal falls below the rate 
limit, the agent will be removed immediately; if it does not, the removal might 
not happen for a long time. It would be great to distinguish between these two 
cases in the log output.

For example: if the rate limit is configured but we're going to remove the 
agent immediately anyway, then just log the same output we normally do (skip 
"Scheduling..."). Whereas if the rate limit is going to delay removing the 
agent, we should (a) make that _clear_ in the output (b) ideally include some 
prediction of how long it will take for the agent to be removed. e.g., 
"Scheduling removal of agent ABC; agent removal rate limit of X is in effect, 
waiting Y until removing the agent."



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to