[ 
https://issues.apache.org/jira/browse/MESOS-5359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15295240#comment-15295240
 ] 

José Guilherme Vanz commented on MESOS-5359:
--------------------------------------------

Taking a look in the related issue MESOS-5330 and scheduler library (i. e.  
MesosSchedulerDriver, SchedulerProcess classes ) I think I will change the 
SchedulerProcess class. Right?
Based on MESOS-5330 solution I have to move the link method call from 
detected() method  to authenticate() and doReliableRegistration()  methods: 

{code:title=sched.cpp|borderStyle=solid}
    if (master.isSome()) {                                                      
                                                                                
                                                                              
      LOG(INFO) << "New master detected at " << master.get().pid();             
                                                                                
                                                                              
      link(master.get().pid());                                                 
                                                                                
                                                                              
                                                                                
                                                                                
                                                                              
      if (credential.isSome()) {                                                
                                                                                
                                                                              
        // Authenticate with the master.                                        
                                                                                
                                                                              
        // TODO(vinod): Do a backoff for authentication similar to what         
                                                                                
                                                                              
        // we do for registration.                                              
                                                                                
                                                                              
        authenticate();                                                         
                                                                                
                                                                              
      } else { 
{code}

Am I right path?




> The scheduler library should have a delay before initiating a connection with 
> master.
> -------------------------------------------------------------------------------------
>
>                 Key: MESOS-5359
>                 URL: https://issues.apache.org/jira/browse/MESOS-5359
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.29.0
>            Reporter: Anand Mazumdar
>            Assignee: José Guilherme Vanz
>              Labels: mesosphere
>
> Currently, the scheduler library does have an artificially induced delay when 
> trying to initially establish a connection with the master. In the event of a 
> master failover or ZK disconnect, a large number of frameworks can get 
> disconnected and then thereby overwhelm the master with TCP SYN requests. 
> On a large cluster with many agents, the master is already overwhelmed with 
> handling connection requests from the agents. This compounds the issue 
> further on the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to