[
https://issues.apache.org/jira/browse/MESOS-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dominic Hamon updated MESOS-1862:
---------------------------------
Sprint: Mesos Q3 Sprint 7
> Performance regression in the Master's http metrics.
> ----------------------------------------------------
>
> Key: MESOS-1862
> URL: https://issues.apache.org/jira/browse/MESOS-1862
> Project: Mesos
> Issue Type: Bug
> Components: master
> Affects Versions: 0.21.0
> Reporter: Benjamin Mahler
> Assignee: Benjamin Mahler
> Priority: Blocker
>
> As part of the change to hold on to terminal unacknowledged tasks in the
> master, we introduced a performance regression during the following patch:
> https://github.com/apache/mesos/commit/0760b007ad65bc91e8cea377339978c78d36d247
> {noformat}
> commit 0760b007ad65bc91e8cea377339978c78d36d247
> Author: Benjamin Mahler <[email protected]>
> Date: Thu Sep 11 10:48:20 2014 -0700
> Minor cleanups to the Master code.
> Review: https://reviews.apache.org/r/25566
> {noformat}
> Rather than keeping a running count of allocated resources, we now compute
> resources on-demand. This was done in order to ignore terminal task's
> resources.
> As a result of this change, the /stats.json and /metrics/snapshot endpoints
> on the master have slowed down substantially on large clusters.
> {noformat}
> $ time curl localhost:5050/health
> real 0m0.004s
> user 0m0.001s
> sys 0m0.002s
> $ time curl localhost:5050/stats.json > /dev/null
> real 0m15.402s
> user 0m0.001s
> sys 0m0.003s
> $ time curl localhost:5050/metrics/snapshot > /dev/null
> real 0m6.059s
> user 0m0.002s
> sys 0m0.002s
> {noformat}
> {{perf top}} reveals some of the resource computation during a request to
> stats.json:
> {noformat: perf top}
> Events: 36K cycles
> 10.53% libc-2.5.so [.] _int_free
> 9.90% libc-2.5.so [.] malloc
> 8.56% libmesos-0.21.0.so [.] std::_Rb_tree<process::ProcessBase*,
> process::ProcessBase*, std::_Identity<process::ProcessBase*>,
> std::less<process::ProcessBase*>, std::allocator<process::ProcessBase*> >::
> 8.23% libc-2.5.so [.] _int_malloc
> 5.80% libstdc++.so.6.0.8 [.]
> std::_Rb_tree_increment(std::_Rb_tree_node_base*)
> 5.33% [kernel] [k] _raw_spin_lock
> 3.13% libstdc++.so.6.0.8 [.] std::string::assign(std::string const&)
> 2.95% libmesos-0.21.0.so [.]
> process::SocketManager::exited(process::ProcessBase*)
> 2.43% libmesos-0.21.0.so [.] mesos::Resource::MergeFrom(mesos::Resource
> const&)
> 1.88% libmesos-0.21.0.so [.] mesos::internal::master::Slave::used() const
> 1.48% libstdc++.so.6.0.8 [.] __gnu_cxx::__atomic_add(int volatile*,
> int)
> 1.45% [kernel] [k] find_busiest_group
> 1.41% libc-2.5.so [.] free
> 1.38% libmesos-0.21.0.so [.]
> mesos::Value_Range::MergeFrom(mesos::Value_Range const&)
> 1.13% libmesos-0.21.0.so [.]
> mesos::Value_Scalar::MergeFrom(mesos::Value_Scalar const&)
> 1.12% libmesos-0.21.0.so [.] mesos::Resource::SharedDtor()
> 1.07% libstdc++.so.6.0.8 [.] __gnu_cxx::__exchange_and_add(int
> volatile*, int)
> 0.94% libmesos-0.21.0.so [.]
> google::protobuf::UnknownFieldSet::MergeFrom(google::protobuf::UnknownFieldSet
> const&)
> 0.92% libstdc++.so.6.0.8 [.] operator new(unsigned long)
> 0.88% libmesos-0.21.0.so [.]
> mesos::Value_Ranges::MergeFrom(mesos::Value_Ranges const&)
> 0.75% libmesos-0.21.0.so [.] mesos::matches(mesos::Resource const&,
> mesos::Resource const&)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)