Failover timeout should default to 0
------------------------------------
Key: MESOS-106
URL: https://issues.apache.org/jira/browse/MESOS-106
Project: Mesos
Issue Type: Improvement
Reporter: Matei Zaharia
Since the failover timeout was added, you get a lot of weird behavior in
clusters running frameworks that don't support failover due to its long default
value of 1 day. If a framework fails or just exits without calling
driver.stop(), all its executors stay around and consume resources on the
machines, causing subsequent runs to mysteriously fail to acquire resources.
See http://groups.google.com/group/spark-users/msg/553af12424e4ed3d for an
example. I know that the failover timeout is supposed to eventually become a
per-framework parameter anyway, but in the meantime, the easiest way to prevent
this is to set it to 0, because almost no users have failover-enabled
frameworks.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira