Rate limiting task kills (or more specifically framework shutdowns on
agents) of non-partition aware frameworks sounds good to me. I would like
us to have as much backwards compatibility as possible here. We can update
the flags' help saying that this doesn't apply to partition aware
frameworks.
Hi Ben,
Thanks for the feedback! Seems like we're on the same page overall.
On Thu, Jul 28, 2016 at 8:42 AM, Benjamin Mahler wrote:
> It seems to me that these particular flags are not applicable for
> PARTITION_AWARE frameworks, since there is no removal occurring.
FWIW,
It seems to me that these particular flags are not applicable for
PARTITION_AWARE frameworks, since there is no removal occurring. For old
frameworks, they still act as if removal is occurring and so these flags
provide the backwards compatibility by rate limiting the shutting down of
Hi folks,
There are two "safety limits" in place that control the master's agent
removal behavior:
(1) "--agent_removal_rate_limit" controls the rate at which agents can
be removed from the cluster when they fail health checks.
(2) "--recovery_agent_removal_limit" controls the fraction of