Benjamin Mahler created MESOS-1862:
--------------------------------------
Summary: Performance regression in the Master's http metrics.
Key: MESOS-1862
URL: https://issues.apache.org/jira/browse/MESOS-1862
Project: Mesos
Issue Type: Bug
Components: master
Affects Versions: 0.21.0
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler
Priority: Blocker
As part of the change to hold on to terminal unacknowledged tasks in the
master, we introduced a performance regression during the following patch:
https://github.com/apache/mesos/commit/0760b007ad65bc91e8cea377339978c78d36d247
{noformat}
commit 0760b007ad65bc91e8cea377339978c78d36d247
Author: Benjamin Mahler <[email protected]>
Date: Thu Sep 11 10:48:20 2014 -0700
Minor cleanups to the Master code.
Review: https://reviews.apache.org/r/25566
{noformat}
Rather than keeping a running count of allocated resources, we now compute
resources on-demand. This was done in order to ignore terminal task's resources.
As a result of this change, the /stats.json and /metrics/snapshot endpoints on
the master have slowed down substantially on large clusters.
{noformat}
$ time curl localhost:5050/health
real 0m0.004s
user 0m0.001s
sys 0m0.002s
$ time curl localhost:5050/stats.json > /dev/null
real 0m15.402s
user 0m0.001s
sys 0m0.003s
$ time curl localhost:5050/metrics/snapshot > /dev/null
real 0m6.059s
user 0m0.002s
sys 0m0.002s
{noformat}
{{perf top}} reveals some of the resource computation during a request to
stats.json:
{noformat: perf top}
Events: 36K cycles
10.53% libc-2.5.so [.] _int_free
9.90% libc-2.5.so [.] malloc
8.56% libmesos-0.21.0.so [.] std::_Rb_tree<process::ProcessBase*,
process::ProcessBase*, std::_Identity<process::ProcessBase*>,
std::less<process::ProcessBase*>, std::allocator<process::ProcessBase*> >::
8.23% libc-2.5.so [.] _int_malloc
5.80% libstdc++.so.6.0.8 [.]
std::_Rb_tree_increment(std::_Rb_tree_node_base*)
5.33% [kernel] [k] _raw_spin_lock
3.13% libstdc++.so.6.0.8 [.] std::string::assign(std::string const&)
2.95% libmesos-0.21.0.so [.]
process::SocketManager::exited(process::ProcessBase*)
2.43% libmesos-0.21.0.so [.] mesos::Resource::MergeFrom(mesos::Resource
const&)
1.88% libmesos-0.21.0.so [.] mesos::internal::master::Slave::used() const
1.48% libstdc++.so.6.0.8 [.] __gnu_cxx::__atomic_add(int volatile*, int)
1.45% [kernel] [k] find_busiest_group
1.41% libc-2.5.so [.] free
1.38% libmesos-0.21.0.so [.]
mesos::Value_Range::MergeFrom(mesos::Value_Range const&)
1.13% libmesos-0.21.0.so [.]
mesos::Value_Scalar::MergeFrom(mesos::Value_Scalar const&)
1.12% libmesos-0.21.0.so [.] mesos::Resource::SharedDtor()
1.07% libstdc++.so.6.0.8 [.] __gnu_cxx::__exchange_and_add(int
volatile*, int)
0.94% libmesos-0.21.0.so [.]
google::protobuf::UnknownFieldSet::MergeFrom(google::protobuf::UnknownFieldSet
const&)
0.92% libstdc++.so.6.0.8 [.] operator new(unsigned long)
0.88% libmesos-0.21.0.so [.]
mesos::Value_Ranges::MergeFrom(mesos::Value_Ranges const&)
0.75% libmesos-0.21.0.so [.] mesos::matches(mesos::Resource const&,
mesos::Resource const&)
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)