[
https://issues.apache.org/jira/browse/MESOS-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191325#comment-15191325
]
Cong Wang commented on MESOS-4740:
----------------------------------
[~bmahler] No, we can't reproduce this bug yet. In theory, it could be anything
with regarding to CPU usage, since Mesos master is not running in any contained
environment... This is why I can only optimize the metrics/snapshot call path
without any further information. What we do know is that we have ~30K slaves
and ~100K tasks per cluster. Each time we pull the stats, we iterate over these
100K tasks for 3 times... This is completely a waste of CPU cycles, since this
can be amortized. This is a clear improvement we can see. Of course, we need
further information to do further optimization.
> Improve master metrics/snapshot performace
> ------------------------------------------
>
> Key: MESOS-4740
> URL: https://issues.apache.org/jira/browse/MESOS-4740
> Project: Mesos
> Issue Type: Task
> Reporter: Cong Wang
> Assignee: Cong Wang
>
> [~drobinson] noticed retrieving metrics/snapshot statistics could be very
> inefficient.
> {noformat}
> [user@server ~]$ time curl -s localhost:5050/metrics/snapshot
> real 0m35.654s
> user 0m0.019s
> sys 0m0.011s
> {noformat}
> MESOS-1287 introduces a timeout parameter for this query, but for
> metric-collectors like ours they are not aware of such URL-specific
> parameter, so we need:
> 1) We should always have a timeout and set some default value to it
> 2) Investigate why master metrics/snapshot could take such a long time to
> complete under load.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)