[ 
https://issues.apache.org/jira/browse/MESOS-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391263#comment-14391263
 ] 

Cody Maloney commented on MESOS-2507:
-------------------------------------

Looking at the code a couple other things pop out as likely perf problems:
Doing .contains() seperately from .insert() -- this casues the hashmap to be 
searched twice (Which results in multiple slow linked list of bucket 
traversals). Would be good to add an API to hashset to "insert or fail".

Checking "Registering" before iterating over registered would likely save a lot 
of time (Although should check), because if the master really is backlogged the 
slaves will keep sending new registration messages causing that loop to go a 
lot. Checking if registering contains the slave should be cheap / quick.

> Performance issue in the master when a large number of slaves are registering.
> ------------------------------------------------------------------------------
>
>                 Key: MESOS-2507
>                 URL: https://issues.apache.org/jira/browse/MESOS-2507
>             Project: Mesos
>          Issue Type: Improvement
>          Components: master
>            Reporter: Benjamin Mahler
>              Labels: scalability, twitter
>
> For large clusters, when a lot of slaves are registering, the master gets 
> backlogged processing registration requests. {{perf}} revealed the following:
> {code}
> Events: 14K cycles
>  25.44%  libmesos-0.22.0-x.so  [.] 
> mesos::internal::master::Master::registerSlave(process::UPID const&, 
> mesos::SlaveInfo const&, std::vector<mesos::Resource, 
> std::allocator<mesos::Resource> > cons
>  11.18%  libmesos-0.22.0-x.so  [.] pipecb
>   5.88%  libc-2.5.so             [.] malloc_consolidate
>   5.33%  libc-2.5.so             [.] _int_free
>   5.25%  libc-2.5.so             [.] malloc
>   5.23%  libc-2.5.so             [.] _int_malloc
>   4.11%  libstdc++.so.6.0.8      [.] std::string::assign(std::string const&)
>   3.22%  libmesos-0.22.0-x.so  [.] mesos::Resource::SharedDtor()
>   3.10%  [kernel]                [k] _raw_spin_lock
>   1.97%  libmesos-0.22.0-x.so  [.] mesos::Attribute::SharedDtor()
>   1.28%  libc-2.5.so             [.] memcmp
>   1.08%  libc-2.5.so             [.] free
> {code}
> This is likely because we loop over all the slaves for each registration:
> {code}
> void Master::registerSlave(
>     const UPID& from,
>     const SlaveInfo& slaveInfo,
>     const vector<Resource>& checkpointedResources,
>     const string& version)
> {
>   // ...
>   // Check if this slave is already registered (because it retries).
>   foreachvalue (Slave* slave, slaves.registered) {
>     if (slave->pid == from) {
>       // ...
>     }
>   }
>   // ...
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to