[
https://issues.apache.org/jira/browse/MESOS-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247043#comment-15247043
]
José Guilherme Vanz commented on MESOS-1575:
--------------------------------------------
I changed the default failover timeout to 1 week:
{code}
vanz@london build]$ git diff
diff --git a/include/mesos/mesos.proto b/include/mesos/mesos.proto
index 87af4a0..3af43b7 100644
--- a/include/mesos/mesos.proto
+++ b/include/mesos/mesos.proto
@@ -228,7 +228,7 @@ message FrameworkInfo {
//
// NOTE: To avoid accidental destruction of tasks, production
// frameworks typically set this to a large value (e.g., 1 week).
- optional double failover_timeout = 4 [default = 0.0];
+ optional double failover_timeout = 4 [default = 604800];
// If set, framework pid, executor pids and status updates are
// checkpointed to disk by the slaves. Checkpointing allows a
diff --git a/include/mesos/v1/mesos.proto b/include/mesos/v1/mesos.proto
index 34da0a1..a576f11 100644
--- a/include/mesos/v1/mesos.proto
+++ b/include/mesos/v1/mesos.proto
@@ -228,7 +228,7 @@ message FrameworkInfo {
//
// NOTE: To avoid accidental destruction of tasks, production
// frameworks typically set this to a large value (e.g., 1 week).
- optional double failover_timeout = 4 [default = 0.0];
+ optional double failover_timeout = 4 [default = 604800];
// If set, framework pid, executor pids and status updates are
// checkpointed to disk by the agents. Checkpointing allows a
{code}
As result, the failover timeout is being used when the framework disconnects:
{code}
I0418 23:34:10.686487 11890 master.cpp:1375] Framework
0445e63c-c455-4c88-893e-61d740493432-0000 (Rendler Framework (Java)) at
[email protected]:41977 disconnected
I0418 23:34:10.686553 11890 master.cpp:2764] Disconnecting framework
0445e63c-c455-4c88-893e-61d740493432-0000 (Rendler Framework (Java)) at
[email protected]:41977 I0418
23:34:10.686590 11890 master.cpp:2788] Deactivating framework
0445e63c-c455-4c88-893e-61d740493432-0000 (Rendler Framework (Java)) at
[email protected]:41977
W0418 23:34:10.686745 11890 master.cpp:1394] Using the default value for
'failover_timeout' because the input value is invalid: Argument out of the
range that a Duration can represent due to int64_t's size limit
I0418 23:34:10.686766 11890 master.cpp:1399] Giving framework
0445e63c-c455-4c88-893e-61d740493432-0000 (Rendler Framework (Java)) at
[email protected]:41977 1weeks to
failover
I0418 23:34:10.686920 11890
hierarchical.cpp:375] Deactivated framework
0445e63c-c455-4c88-893e-61d740493432-0000
{code}
Should I change the code to validate the value instead of using default value?
What do you think is better approach?
> master sets failover timeout to 0 when framework requests a high value
> ----------------------------------------------------------------------
>
> Key: MESOS-1575
> URL: https://issues.apache.org/jira/browse/MESOS-1575
> Project: Mesos
> Issue Type: Bug
> Reporter: Kevin Sweeney
> Assignee: José Guilherme Vanz
> Labels: newbie, twitter
>
> In response to a registered RPC we observed the following behavior:
> {noformat}
> W0709 19:07:32.982997 11400 master.cpp:612] Using the default value for
> 'failover_timeout' becausethe input value is invalid: Argument out of the
> range that a Duration can represent due to int64_t's size limit
> I0709 19:07:32.983008 11404 hierarchical_allocator_process.hpp:408]
> Deactivated framework 20140709-184342-119646400-5050-11380-0003
> I0709 19:07:32.983013 11400 master.cpp:617] Giving framework
> 20140709-184342-119646400-5050-11380-0003 0ns to failover
> I0709 19:07:32.983271 11404 master.cpp:2201] Framework failover timeout,
> removing framework 20140709-184342-119646400-5050-11380-0003
> I0709 19:07:32.983294 11404 master.cpp:2688] Removing framework
> 20140709-184342-119646400-5050-11380-0003
> I0709 19:07:32.983678 11404 hierarchical_allocator_process.hpp:363] Removed
> framework 20140709-184342-119646400-5050-11380-0003
> {noformat}
> This was using the following frameworkInfo.
> {code}
> FrameworkInfo frameworkInfo = FrameworkInfo.newBuilder()
> .setUser("test")
> .setName("jvm")
> .setFailoverTimeout(Long.MAX_VALUE)
> .build();
> {code}
> Instead of silently defaulting large values to 0 the master should refuse to
> process the request.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)