[ https://issues.apache.org/jira/browse/MESOS-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Timothy Chen updated MESOS-2182: -------------------------------- Fix Version/s: (was: 0.22.0) 0.21.1 > Performance issue in libprocess SocketManager. > ---------------------------------------------- > > Key: MESOS-2182 > URL: https://issues.apache.org/jira/browse/MESOS-2182 > Project: Mesos > Issue Type: Bug > Components: libprocess > Affects Versions: 0.21.0 > Reporter: Benjamin Mahler > Assignee: Benjamin Mahler > Priority: Blocker > Fix For: 0.21.1 > > > Noticed an issue in production under which the master is slow to respond > after failover for ~15 minutes. > After looking at some perf data, the top offender is: > {noformat} > 12.02% mesos-master libmesos-0.21.0-rc3.so [.] > std::_Rb_tree<process::ProcessBase*, process::ProcessBase*, > std::_Identity<process::ProcessBase*>, std::less<process::ProcessBase*>, > std::allocator<process::ProcessBase*> >::erase(process::ProcessBase* const&) > ... > 3.29% mesos-master libmesos-0.21.0-rc3.so [.] > process::SocketManager::exited(process::ProcessBase*) > {noformat} > It appears that in the SocketManager, whenever an internal Process exits, we > loop over all the links unnecessarily: > {code} > void SocketManager::exited(ProcessBase* process) > { > // An exited event is enough to cause the process to get deleted > // (e.g., by the garbage collector), which means we can't > // dereference process (or even use the address) after we enqueue at > // least one exited event. Thus, we save the process pid. > const UPID pid = process->pid; > // Likewise, we need to save the current time of the process so we > // can update the clocks of linked processes as appropriate. > const Time time = Clock::now(process); > synchronized (this) { > // Iterate through the links, removing any links the process might > // have had and creating exited events for any linked processes. > foreachpair (const UPID& linkee, set<ProcessBase*>& processes, links) { > processes.erase(process); > if (linkee == pid) { > foreach (ProcessBase* linker, processes) { > CHECK(linker != process) << "Process linked with itself"; > synchronized (timeouts) { > if (Clock::paused()) { > Clock::update(linker, time); > } > } > linker->enqueue(new ExitedEvent(linkee)); > } > } > } > links.erase(pid); > } > } > {code} > On clusters with 10,000s of slaves, this means we hold the socket manager > lock for a very expensive loop erasing nothing from a set! This is because, > the master contains links from the Master Process to each slave. However, > when a random ephemeral Process terminates, we don't need to loop over each > slave link. > While we hold this lock, the following calls will block: > {code} > class SocketManager > { > public: > Socket accepted(int s); > void link(ProcessBase* process, const UPID& to); > PID<HttpProxy> proxy(const Socket& socket); > void send(Encoder* encoder, bool persist); > void send(const Response& response, > const Request& request, > const Socket& socket); > void send(Message* message); > Encoder* next(int s); > void close(int s); > void exited(const Node& node); > void exited(ProcessBase* process); > ... > {code} > As a result, the slave observers and the master can block calling send()! > Short term, we will try to fix this issue by removing the unnecessary > looping. Longer term, it would be nice to avoid all this locking when sending > on independent sockets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)