[
https://issues.apache.org/jira/browse/MESOS-5359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15295240#comment-15295240
]
José Guilherme Vanz commented on MESOS-5359:
--------------------------------------------
Taking a look in the related issue MESOS-5330 and scheduler library (i. e.
MesosSchedulerDriver, SchedulerProcess classes ) I think I will change the
SchedulerProcess class. Right?
Based on MESOS-5330 solution I have to move the link method call from
detected() method to authenticate() and doReliableRegistration() methods:
{code:title=sched.cpp|borderStyle=solid}
if (master.isSome()) {
LOG(INFO) << "New master detected at " << master.get().pid();
link(master.get().pid());
if (credential.isSome()) {
// Authenticate with the master.
// TODO(vinod): Do a backoff for authentication similar to what
// we do for registration.
authenticate();
} else {
{code}
Am I right path?
> The scheduler library should have a delay before initiating a connection with
> master.
> -------------------------------------------------------------------------------------
>
> Key: MESOS-5359
> URL: https://issues.apache.org/jira/browse/MESOS-5359
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 0.29.0
> Reporter: Anand Mazumdar
> Assignee: José Guilherme Vanz
> Labels: mesosphere
>
> Currently, the scheduler library does have an artificially induced delay when
> trying to initially establish a connection with the master. In the event of a
> master failover or ZK disconnect, a large number of frameworks can get
> disconnected and then thereby overwhelm the master with TCP SYN requests.
> On a large cluster with many agents, the master is already overwhelmed with
> handling connection requests from the agents. This compounds the issue
> further on the master.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
