[ 
https://issues.apache.org/jira/browse/MESOS-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292692#comment-15292692
 ] 

Yanyan Hu commented on MESOS-3051:
----------------------------------

Hi, guys, I'm now trying to use Mesos to manage a container cluster in large 
scale. And I'm using Mesos-0.25.0 with Marathon stays upon it. But when I made 
test using this environment, I found we still suffered from this issue when 
Marathon allocated port resource randomly.

In my test, three Mesos-slaves were activated with each one has available port 
resource of [31000-37000]. Then I tried to created more than 3000 tasks in 
three slave nodes. I found when task amount reached 3000, it cost nearly 800 
milisecond to finish a calculation of "Resources available = 
slaves[slaveId].total - slaves[slaveId].allocated 
" which is performed in HierarchicalAllocatorProcess::allocate() function:
https://github.com/apache/mesos/blob/0.25.0/src/master/allocator/mesos/hierarchical.hpp#L1284

Since I have three Mesos-slaves, the total time consumption of each invoking 
for allocate() function is more than 2 seconds which make the performance of 
Mesos-master very terrible.

So I tried to made a simple test to evaluate the performance of "Ranges" value 
calculation. I found the performance of subtraction operation is still not good:

e.g. res1 = [1-6000], res2 = [1-1, 3-3, 5-5, ...]
I changed the range_size of res2 and recorded the execution time for "res1 -= 
res", the result is as followed:
(Test was done in a x86 VM which has 4 process cores and 16GB memory)

    res2 range_size                        execution time(second)
    1                                                0.003   (0.002 in kernel 
mode)
    100                                            0.011
    200                                            0.031
    400                                            0.121
    800                                            0.533
    1600                                          2.157

By comparison, the performance of addition and comparison operations are much 
better. So looks like the current fix haven't completely resolved this problem. 
Based on our test, the Mesos-master's performance seriously suffered from this 
issue when task amount is more than 10000 with 20 activated Mesos-slave nodes.

I haven't tried latest Mesos release, but after checking the code of 
src/common/values.cpp in master branch, I found the implementation of "Ranges" 
data type is almost the same as in 0.25.0 release:
https://github.com/apache/mesos/blob/master/src/common/values.cpp
https://github.com/apache/mesos/blob/0.25.0/src/common/values.cpp

So I guess the problem is still there? So is there any way we can further 
optimize the implementation of "Ranges" data type so we can avoid this 
performance bottleneck? Thanks.

> performance issues with port ranges comparison
> ----------------------------------------------
>
>                 Key: MESOS-3051
>                 URL: https://issues.apache.org/jira/browse/MESOS-3051
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation
>    Affects Versions: 0.22.1
>            Reporter: James Peach
>            Assignee: Joerg Schad
>              Labels: mesosphere
>             Fix For: 0.25.0, 0.24.2
>
>
> Testing in an environment with lots of frameworks (>200), where the 
> frameworks permanently decline resources they don't need. The allocator ends 
> up spending a lot of time figuring out whether offers are refused (the code 
> path through {{HierarchicalAllocatorProcess::isFiltered()}}.
> In profiling a synthetic benchmark, it turns out that comparing port ranges 
> is very expensive, involving many temporary allocations. 61% of 
> Resources::contains() run time is in operator -= (Resource). 35% of 
> Resources::contains() run time is in Resources::_contains().
> The heaviest call chain through {{Resources::_contains}} is:
> {code}
> Running Time          Self (ms)         Symbol Name
> 7237.0ms   35.5%          4.0            
> mesos::Resources::_contains(mesos::Resource const&) const
> 7200.0ms   35.3%          1.0             mesos::contains(mesos::Resource 
> const&, mesos::Resource const&)
> 7133.0ms   35.0%        121.0              
> mesos::operator<=(mesos::Value_Ranges const&, mesos::Value_Ranges const&)
> 6319.0ms   31.0%          7.0               
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Ranges const&)
> 6240.0ms   30.6%        161.0                
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&)
> 1867.0ms    9.1%         25.0                 mesos::Value_Ranges::add_range()
> 1694.0ms    8.3%          4.0                 
> mesos::Value_Ranges::~Value_Ranges()
> 1495.0ms    7.3%         16.0                 
> mesos::Value_Ranges::operator=(mesos::Value_Ranges const&)
>  445.0ms    2.1%         94.0                 
> mesos::Value_Range::MergeFrom(mesos::Value_Range const&)
>  154.0ms    0.7%         24.0                 mesos::Value_Ranges::range(int) 
> const
>  103.0ms    0.5%         24.0                 
> mesos::Value_Ranges::range_size() const
>   95.0ms    0.4%          2.0                 
> mesos::Value_Range::Value_Range(mesos::Value_Range const&)
>   59.0ms    0.2%          4.0                 
> mesos::Value_Ranges::Value_Ranges()
>   50.0ms    0.2%         50.0                 mesos::Value_Range::begin() 
> const
>   28.0ms    0.1%         28.0                 mesos::Value_Range::end() const
>   26.0ms    0.1%          0.0                 
> mesos::Value_Range::~Value_Range()
> {code}
> mesos::coalesce(Value_Ranges) gets done a lot and ends up being really 
> expensive. The heaviest parts of the inverted call chain are:
> {code}
> Running Time  Self (ms)               Symbol Name
> 3209.0ms   15.7%      3209.0          mesos::Value_Range::~Value_Range()
> 3209.0ms   15.7%      0.0              
> google::protobuf::internal::GenericTypeHandler<mesos::Value_Range>::Delete(mesos::Value_Range*)
> 3209.0ms   15.7%      0.0               void 
> google::protobuf::internal::RepeatedPtrFieldBase::Destroy<google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler>()
> 3209.0ms   15.7%      0.0                
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::~RepeatedPtrField()
> 3209.0ms   15.7%      0.0                 
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::~RepeatedPtrField()
> 3209.0ms   15.7%      0.0                  
> mesos::Value_Ranges::~Value_Ranges()
> 3209.0ms   15.7%      0.0                   
> mesos::Value_Ranges::~Value_Ranges()
> 2441.0ms   11.9%      0.0                    
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&)
>  452.0ms    2.2%      0.0                    
> mesos::remove(mesos::Value_Ranges*, mesos::Value_Range const&)
>  169.0ms    0.8%      0.0                    
> mesos::operator<=(mesos::Value_Ranges const&, mesos::Value_Ranges const&)
>   82.0ms    0.4%      0.0                    
> mesos::operator-=(mesos::Value_Ranges&, mesos::Value_Ranges const&)
>   65.0ms    0.3%      0.0                    
> mesos::Value_Ranges::~Value_Ranges()
> 2541.0ms   12.4%      2541.0          
> google::protobuf::internal::GenericTypeHandler<mesos::Value_Range>::New()
> 2541.0ms   12.4%      0.0              
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler::Type* 
> google::protobuf::internal::RepeatedPtrFieldBase::Add<google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler>()
> 2305.0ms   11.3%      0.0               
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::Add()
> 2305.0ms   11.3%      0.0                mesos::Value_Ranges::add_range()
> 1962.0ms    9.6%      0.0                 
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&)
>  343.0ms    1.6%      0.0                 
> mesos::ranges::add(mesos::Value_Ranges*, long long, long long)
> 236.0ms    1.1%       0.0               void 
> google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom<google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase
>  const&)
> 1471.0ms    7.2%      1471.0          
> google::protobuf::internal::RepeatedPtrFieldBase::Reserve(int)
> 1333.0ms    6.5%      0.0              
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler::Type* 
> google::protobuf::internal::RepeatedPtrFieldBase::Add<google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler>()
> 1333.0ms    6.5%      0.0               
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::Add()
> 1333.0ms    6.5%      0.0                mesos::Value_Ranges::add_range()
> 1086.0ms    5.3%      0.0                 
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&)
>  247.0ms    1.2%      0.0                 
> mesos::ranges::add(mesos::Value_Ranges*, long long, long long)
> 107.0ms    0.5%       0.0              void 
> google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom<google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase
>  const&)
> 107.0ms    0.5%       0.0               
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::MergeFrom(google::protobuf::RepeatedPtrField<mesos::Value_Range>
>  const&)
> 107.0ms    0.5%       0.0                
> mesos::Value_Ranges::MergeFrom(mesos::Value_Ranges const&)
> 105.0ms    0.5%       0.0                 
> mesos::Value_Ranges::CopyFrom(mesos::Value_Ranges const&)
> 105.0ms    0.5%       0.0                  
> mesos::Value_Ranges::operator=(mesos::Value_Ranges const&)
> 104.0ms    0.5%       0.0                   
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&)
>   1.0ms    0.0%       0.0                   
> mesos::remove(mesos::Value_Ranges*, mesos::Value_Range const&)
>   2.0ms    0.0%       0.0                 
> mesos::Resource::MergeFrom(mesos::Resource const&)
>   2.0ms    0.0%       0.0                  
> google::protobuf::internal::GenericTypeHandler<mesos::Resource>::Merge(mesos::Resource
>  const&, mesos::Resource*)
>   2.0ms    0.0%       0.0                   void 
> google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom<google::protobuf::RepeatedPtrField<mesos::Resource>::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase
>  const&)
>  29.0ms    0.1%       0.0              void 
> google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom<google::protobuf::RepeatedPtrField<mesos::Resource>::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase
>  const&)
> 898.0ms    4.4%       898.0           
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler::Type* 
> google::protobuf::internal::RepeatedPtrFieldBase::Add<google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler>()
> 517.0ms    2.5%       0.0              
> google::protobuf::RepeatedPtrField<mesos::Value_Range>::Add()
> 517.0ms    2.5%       0.0               mesos::Value_Ranges::add_range()
> 429.0ms    2.1%       0.0                
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&)
>  88.0ms    0.4%       0.0                
> mesos::ranges::add(mesos::Value_Ranges*, long long, long long)
> 379.0ms    1.8%       0.0              void 
> google::protobuf::internal::RepeatedPtrFieldBase::MergeFrom<google::protobuf::RepeatedPtrField<mesos::Value_Range>::TypeHandler>(google::protobuf::internal::RepeatedPtrFieldBase
>  const&)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to