[
https://issues.apache.org/jira/browse/MESOS-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549654#comment-14549654
]
Benjamin Mahler commented on MESOS-2507:
----------------------------------------
https://reviews.apache.org/r/34387/
https://reviews.apache.org/r/34388/
https://reviews.apache.org/r/34389/
> Performance issue in the master when a large number of slaves are registering.
> ------------------------------------------------------------------------------
>
> Key: MESOS-2507
> URL: https://issues.apache.org/jira/browse/MESOS-2507
> Project: Mesos
> Issue Type: Improvement
> Components: master
> Reporter: Benjamin Mahler
> Assignee: Benjamin Mahler
> Labels: scalability, twitter
>
> For large clusters, when a lot of slaves are registering, the master gets
> backlogged processing registration requests. {{perf}} revealed the following:
> {code}
> Events: 14K cycles
> 25.44% libmesos-0.22.0-x.so [.]
> mesos::internal::master::Master::registerSlave(process::UPID const&,
> mesos::SlaveInfo const&, std::vector<mesos::Resource,
> std::allocator<mesos::Resource> > cons
> 11.18% libmesos-0.22.0-x.so [.] pipecb
> 5.88% libc-2.5.so [.] malloc_consolidate
> 5.33% libc-2.5.so [.] _int_free
> 5.25% libc-2.5.so [.] malloc
> 5.23% libc-2.5.so [.] _int_malloc
> 4.11% libstdc++.so.6.0.8 [.] std::string::assign(std::string const&)
> 3.22% libmesos-0.22.0-x.so [.] mesos::Resource::SharedDtor()
> 3.10% [kernel] [k] _raw_spin_lock
> 1.97% libmesos-0.22.0-x.so [.] mesos::Attribute::SharedDtor()
> 1.28% libc-2.5.so [.] memcmp
> 1.08% libc-2.5.so [.] free
> {code}
> This is likely because we loop over all the slaves for each registration:
> {code}
> void Master::registerSlave(
> const UPID& from,
> const SlaveInfo& slaveInfo,
> const vector<Resource>& checkpointedResources,
> const string& version)
> {
> // ...
> // Check if this slave is already registered (because it retries).
> foreachvalue (Slave* slave, slaves.registered) {
> if (slave->pid == from) {
> // ...
> }
> }
> // ...
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)