Re: Rate-limiting agent removal w/ PARTITION_AWARE

2016-08-01 Thread Vinod Kone
Rate limiting task kills (or more specifically framework shutdowns on agents) of non-partition aware frameworks sounds good to me. I would like us to have as much backwards compatibility as possible here. We can update the flags' help saying that this doesn't apply to partition aware frameworks.

Re: Rate-limiting agent removal w/ PARTITION_AWARE

2016-07-30 Thread Neil Conway
Hi Ben, Thanks for the feedback! Seems like we're on the same page overall. On Thu, Jul 28, 2016 at 8:42 AM, Benjamin Mahler wrote: > It seems to me that these particular flags are not applicable for > PARTITION_AWARE frameworks, since there is no removal occurring. FWIW,

Re: Rate-limiting agent removal w/ PARTITION_AWARE

2016-07-28 Thread Benjamin Mahler
It seems to me that these particular flags are not applicable for PARTITION_AWARE frameworks, since there is no removal occurring. For old frameworks, they still act as if removal is occurring and so these flags provide the backwards compatibility by rate limiting the shutting down of

Rate-limiting agent removal w/ PARTITION_AWARE

2016-07-27 Thread Neil Conway
Hi folks, There are two "safety limits" in place that control the master's agent removal behavior: (1) "--agent_removal_rate_limit" controls the rate at which agents can be removed from the cluster when they fail health checks. (2) "--recovery_agent_removal_limit" controls the fraction of