[ 
https://issues.apache.org/jira/browse/MESOS-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247043#comment-15247043
 ] 

José Guilherme Vanz commented on MESOS-1575:
--------------------------------------------

I changed the default failover timeout to 1 week:

{code}
vanz@london build]$ git diff
diff --git a/include/mesos/mesos.proto b/include/mesos/mesos.proto
index 87af4a0..3af43b7 100644
--- a/include/mesos/mesos.proto
+++ b/include/mesos/mesos.proto
@@ -228,7 +228,7 @@ message FrameworkInfo {
   //
   // NOTE: To avoid accidental destruction of tasks, production
   // frameworks typically set this to a large value (e.g., 1 week).
-  optional double failover_timeout = 4 [default = 0.0];
+  optional double failover_timeout = 4 [default = 604800];
 
   // If set, framework pid, executor pids and status updates are
   // checkpointed to disk by the slaves. Checkpointing allows a
diff --git a/include/mesos/v1/mesos.proto b/include/mesos/v1/mesos.proto
index 34da0a1..a576f11 100644
--- a/include/mesos/v1/mesos.proto
+++ b/include/mesos/v1/mesos.proto
@@ -228,7 +228,7 @@ message FrameworkInfo {
   //
   // NOTE: To avoid accidental destruction of tasks, production
   // frameworks typically set this to a large value (e.g., 1 week).
-  optional double failover_timeout = 4 [default = 0.0];
+  optional double failover_timeout = 4 [default = 604800];
 
   // If set, framework pid, executor pids and status updates are
   // checkpointed to disk by the agents. Checkpointing allows a
{code}

As result, the failover timeout is being used when the framework disconnects:

{code}
I0418 23:34:10.686487 11890 master.cpp:1375] Framework 
0445e63c-c455-4c88-893e-61d740493432-0000 (Rendler Framework (Java)) at 
[email protected]:41977 disconnected     
 
I0418 23:34:10.686553 11890 master.cpp:2764] Disconnecting framework 
0445e63c-c455-4c88-893e-61d740493432-0000 (Rendler Framework (Java)) at 
[email protected]:41977     I0418 
23:34:10.686590 11890 master.cpp:2788] Deactivating framework 
0445e63c-c455-4c88-893e-61d740493432-0000 (Rendler Framework (Java)) at 
[email protected]:41977      
W0418 23:34:10.686745 11890 master.cpp:1394] Using the default value for 
'failover_timeout' because the input value is invalid: Argument out of the 
range that a Duration can represent due to int64_t's size limit                 
                                                                                
                                                                                
                            
I0418 23:34:10.686766 11890 master.cpp:1399] Giving framework 
0445e63c-c455-4c88-893e-61d740493432-0000 (Rendler Framework (Java)) at 
[email protected]:41977 1weeks to 
failover                                                                        
                                                                                
                                                 I0418 23:34:10.686920 11890 
hierarchical.cpp:375] Deactivated framework 
0445e63c-c455-4c88-893e-61d740493432-0000                
{code}

Should I change the code to validate the value instead of using default value? 
What do you think is  better approach? 

> master sets failover timeout to 0 when framework requests a high value
> ----------------------------------------------------------------------
>
>                 Key: MESOS-1575
>                 URL: https://issues.apache.org/jira/browse/MESOS-1575
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Kevin Sweeney
>            Assignee: José Guilherme Vanz
>              Labels: newbie, twitter
>
> In response to a registered RPC we observed the following behavior:
> {noformat}
> W0709 19:07:32.982997 11400 master.cpp:612] Using the default value for 
> 'failover_timeout' becausethe input value is invalid: Argument out of the 
> range that a Duration can represent due to int64_t's size limit
> I0709 19:07:32.983008 11404 hierarchical_allocator_process.hpp:408] 
> Deactivated framework 20140709-184342-119646400-5050-11380-0003
> I0709 19:07:32.983013 11400 master.cpp:617] Giving framework 
> 20140709-184342-119646400-5050-11380-0003 0ns to failover
> I0709 19:07:32.983271 11404 master.cpp:2201] Framework failover timeout, 
> removing framework 20140709-184342-119646400-5050-11380-0003
> I0709 19:07:32.983294 11404 master.cpp:2688] Removing framework 
> 20140709-184342-119646400-5050-11380-0003
> I0709 19:07:32.983678 11404 hierarchical_allocator_process.hpp:363] Removed 
> framework 20140709-184342-119646400-5050-11380-0003
> {noformat}
> This was using the following frameworkInfo.
> {code}
>     FrameworkInfo frameworkInfo = FrameworkInfo.newBuilder()
>         .setUser("test")
>         .setName("jvm")
>         .setFailoverTimeout(Long.MAX_VALUE)
>         .build();
> {code}
> Instead of silently defaulting large values to 0 the master should refuse to 
> process the request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to