[
https://issues.apache.org/jira/browse/MESOS-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292692#comment-15292692
]
Yanyan Hu commented on MESOS-3051:
----------------------------------
Hi, guys, I'm now trying to use Mesos to manage a container cluster in large
scale. And I'm using Mesos-0.25.0 with Marathon stays upon it. But when I made
test using this environment, I found we still suffered from this issue when
Marathon allocated port resource randomly.
In my test, three Mesos-slaves were activated with each one has available port
resource of [31000-37000]. Then I tried to created more than 3000 tasks in
three slave nodes. I found when task amount reached 3000, it cost nearly 800
milisecond to finish a calculation of "Resources available =
slaves[slaveId].total - slaves[slaveId].allocated
" which is performed in HierarchicalAllocatorProcess::allocate() function:
https://github.com/apache/mesos/blob/0.25.0/src/master/allocator/mesos/hierarchical.hpp#L1284
Since I have three Mesos-slaves, the total time consumption of each invoking
for allocate() function is more than 2 seconds which make the performance of
Mesos-master very terrible.
So I tried to made a simple test to evaluate the performance of "Ranges" value
calculation. I found the performance of subtraction operation is still not good:
e.g. res1 = [1-6000], res2 = [1-1, 3-3, 5-5, ...]
I changed the range_size of res2 and recorded the execution time for "res1 -=
res", the result is as followed:
(Test was done in a x86 VM which has 4 process cores and 16GB memory)
res2 range_size execution time(second)
1 0.003 (0.002 in kernel
mode)
100 0.011
200 0.031
400 0.121
800 0.533
1600 2.157
By comparison, the performance of addition and comparison operations are much
better. So looks like the current fix haven't completely resolved this problem.
Based on our test, the Mesos-master's performance seriously suffered from this
issue when task amount is more than 10000 with 20 activated Mesos-slave nodes.
I haven't tried latest Mesos release, but after checking the code of
src/common/values.cpp in master branch, I found the implementation of "Ranges"
data type is almost the same as in 0.25.0 release:
https://github.com/apache/mesos/blob/master/src/common/values.cpp
https://github.com/apache/mesos/blob/0.25.0/src/common/values.cpp
So I guess the problem is still there? So is there any way we can further
optimize the implementation of "Ranges" data type so we can avoid this
performance bottleneck? Thanks.
> performance issues with port ranges comparison
> ----------------------------------------------
>
> Key: MESOS-3051
> URL: https://issues.apache.org/jira/browse/MESOS-3051
> Project: Mesos
> Issue Type: Bug
> Components: allocation
> Affects Versions: 0.22.1
> Reporter: James Peach
> Assignee: Joerg Schad
> Labels: mesosphere
> Fix For: 0.25.0, 0.24.2
>
>
> Testing in an environment with lots of frameworks (>200), where the
> frameworks permanently decline resources they don't need. The allocator ends
> up spending a lot of time figuring out whether offers are refused (the code
> path through {{HierarchicalAllocatorProcess::isFiltered()}}.
> In profiling a synthetic benchmark, it turns out that comparing port ranges
> is very expensive, involving many temporary allocations. 61% of
> Resources::contains() run time is in operator -= (Resource). 35% of
> Resources::contains() run time is in Resources::_contains().
> The heaviest call chain through {{Resources::_contains}} is:
> {code}
> Running Time Self (ms) Symbol Name
> 7237.0ms 35.5% 4.0
> mesos::Resources::_contains(mesos::Resource const&) const
> 7200.0ms 35.3% 1.0 mesos::contains(mesos::Resource
> const&, mesos::Resource const&)
> 7133.0ms 35.0% 121.0
> mesos::operator<=(mesos::Value_Ranges const&, mesos::Value_Ranges const&)
> 6319.0ms 31.0% 7.0
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Ranges const&)
> 6240.0ms 30.6% 161.0
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&)
> 1867.0ms 9.1% 25.0 mesos::Value_Ranges::add_range()
> 1694.0ms 8.3% 4.0
> mesos::Value_Ranges::~Value_Ranges()
> 1495.0ms 7.3% 16.0
> mesos::Value_Ranges::operator=(mesos::Value_Ranges const&)
> 445.0ms 2.1% 94.0
> mesos::Value_Range::MergeFrom(mesos::Value_Range const&)
> 154.0ms 0.7% 24.0 mesos::Value_Ranges::range(int)
> const
> 103.0ms 0.5% 24.0
> mesos::Value_Ranges::range_size() const
> 95.0ms 0.4% 2.0
> mesos::Value_Range::Value_Range(mesos::Value_Range const&)
> 59.0ms 0.2% 4.0
> mesos::Value_Ranges::Value_Ranges()
> 50.0ms 0.2% 50.0 mesos::Value_Range::begin()
> const
> 28.0ms 0.1% 28.0 mesos::Value_Range::end() const
> 26.0ms 0.1% 0.0
> mesos::Value_Range::~Value_Range()
> {code}
> mesos::coalesce(Value_Ranges) gets done a lot and ends up being really
> expensive. The heaviest parts of the inverted call chain are:
> {code}
> Running Time Self (ms) Symbol Name
> 3209.0ms 15.7% 3209.0 mesos::Value_Range::~Value_Range()
> 3209.0ms 15.7% 0.0
> google::protobuf::internal::GenericTypeHandler<mesos::Value_Range>::Delete(mesos::Value_Range*)
> 3209.0ms 15.7% 0.0 void
> google::protobuf::internal::RepeatedPtrFieldBase::Destroy<google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler>()
> 3209.0ms 15.7% 0.0
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::~RepeatedPtrField()
> 3209.0ms 15.7% 0.0
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::~RepeatedPtrField()
> 3209.0ms 15.7% 0.0
> mesos::Value_Ranges::~Value_Ranges()
> 3209.0ms 15.7% 0.0
> mesos::Value_Ranges::~Value_Ranges()
> 2441.0ms 11.9% 0.0
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&)
> 452.0ms 2.2% 0.0
> mesos::remove(mesos::Value_Ranges*, mesos::Value_Range const&)
> 169.0ms 0.8% 0.0
> mesos::operator<=(mesos::Value_Ranges const&, mesos::Value_Ranges const&)
> 82.0ms 0.4% 0.0
> mesos::operator-=(mesos::Value_Ranges&, mesos::Value_Ranges const&)
> 65.0ms 0.3% 0.0
> mesos::Value_Ranges::~Value_Ranges()
> 2541.0ms 12.4% 2541.0
> google::protobuf::internal::GenericTypeHandler<mesos::Value_Range>::New()
> 2541.0ms 12.4% 0.0
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler::Type*
> google::protobuf::internal::RepeatedPtrFieldBase::Add<google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler>()
> 2305.0ms 11.3% 0.0
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::Add()
> 2305.0ms 11.3% 0.0 mesos::Value_Ranges::add_range()
> 1962.0ms 9.6% 0.0
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&)
> 343.0ms 1.6% 0.0
> mesos::ranges::add(mesos::Value_Ranges*, long long, long long)
> 236.0ms 1.1% 0.0 void
> google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom<google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase
> const&)
> 1471.0ms 7.2% 1471.0
> google::protobuf::internal::RepeatedPtrFieldBase::Reserve(int)
> 1333.0ms 6.5% 0.0
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler::Type*
> google::protobuf::internal::RepeatedPtrFieldBase::Add<google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler>()
> 1333.0ms 6.5% 0.0
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::Add()
> 1333.0ms 6.5% 0.0 mesos::Value_Ranges::add_range()
> 1086.0ms 5.3% 0.0
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&)
> 247.0ms 1.2% 0.0
> mesos::ranges::add(mesos::Value_Ranges*, long long, long long)
> 107.0ms 0.5% 0.0 void
> google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom<google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase
> const&)
> 107.0ms 0.5% 0.0
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::MergeFrom(google::protobuf::RepeatedPtrField<mesos::Value_Range>
> const&)
> 107.0ms 0.5% 0.0
> mesos::Value_Ranges::MergeFrom(mesos::Value_Ranges const&)
> 105.0ms 0.5% 0.0
> mesos::Value_Ranges::CopyFrom(mesos::Value_Ranges const&)
> 105.0ms 0.5% 0.0
> mesos::Value_Ranges::operator=(mesos::Value_Ranges const&)
> 104.0ms 0.5% 0.0
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&)
> 1.0ms 0.0% 0.0
> mesos::remove(mesos::Value_Ranges*, mesos::Value_Range const&)
> 2.0ms 0.0% 0.0
> mesos::Resource::MergeFrom(mesos::Resource const&)
> 2.0ms 0.0% 0.0
> google::protobuf::internal::GenericTypeHandler<mesos::Resource>::Merge(mesos::Resource
> const&, mesos::Resource*)
> 2.0ms 0.0% 0.0 void
> google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom<google::protobuf::RepeatedPtrField<mesos::Resource>::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase
> const&)
> 29.0ms 0.1% 0.0 void
> google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom<google::protobuf::RepeatedPtrField<mesos::Resource>::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase
> const&)
> 898.0ms 4.4% 898.0
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler::Type*
> google::protobuf::internal::RepeatedPtrFieldBase::Add<google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler>()
> 517.0ms 2.5% 0.0
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::Add()
> 517.0ms 2.5% 0.0 mesos::Value_Ranges::add_range()
> 429.0ms 2.1% 0.0
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&)
> 88.0ms 0.4% 0.0
> mesos::ranges::add(mesos::Value_Ranges*, long long, long long)
> 379.0ms 1.8% 0.0 void
> google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom<google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase
> const&)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)