[ 
https://issues.apache.org/jira/browse/MESOS-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191325#comment-15191325
 ] 

Cong Wang commented on MESOS-4740:
----------------------------------

[~bmahler] No, we can't reproduce this bug yet. In theory, it could be anything 
with regarding to CPU usage, since Mesos master is not running in any contained 
environment... This is why I can only optimize the metrics/snapshot call path 
without any further information. What we do know is that we have ~30K slaves 
and ~100K tasks per cluster. Each time we pull the stats, we iterate over these 
100K tasks  for 3 times... This is completely a waste of CPU cycles, since this 
can be amortized. This is a clear improvement we can see. Of course, we need 
further information to do further optimization.

> Improve master metrics/snapshot performace
> ------------------------------------------
>
>                 Key: MESOS-4740
>                 URL: https://issues.apache.org/jira/browse/MESOS-4740
>             Project: Mesos
>          Issue Type: Task
>            Reporter: Cong Wang
>            Assignee: Cong Wang
>
> [~drobinson] noticed retrieving metrics/snapshot statistics could be very 
> inefficient.
> {noformat}
> [user@server ~]$ time curl -s localhost:5050/metrics/snapshot
> real  0m35.654s
> user  0m0.019s
> sys   0m0.011s
> {noformat}
> MESOS-1287 introduces a timeout parameter for this query, but for 
> metric-collectors like ours they are not aware of such URL-specific 
> parameter, so we need:
> 1) We should always have a timeout and set some default value to it
> 2) Investigate why master metrics/snapshot could take such a long time to 
> complete under load.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to