[
https://issues.apache.org/jira/browse/MESOS-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Robinson updated MESOS-4740:
----------------------------------
Description:
[~drobinson] noticed retrieving metrics/snapshot statistics could be very
inefficient.
{noformat}
[user@server ~]$ time curl -s localhost:5050/metrics/snapshot
real 0m35.654s
user 0m0.019s
sys 0m0.011s
{noformat}
MESOS-1287 introduces a timeout parameter for this query, but for
metric-collectors like ours they are not aware of such URL-specific parameter,
so we need:
1) We should always have a timeout and set some default value to it
2) Investigate why metrics/snapshot could take such a long time to complete
under load, since we don't use history for these statistics and the values are
just some atomic read.
was:
David Robinson noticed retrieving metrics/snapshot statistics could be very
inefficient and cause Mesos master stuck.
{noformat}
[root@atla-bny-34-sr1 ~]# time curl -s localhost:5051/metrics/snapshot
real 2m7.302s
user 0m0.001s
sys 0m0.004s
{noformat}
MESOS-1287 introduces a timeout parameter for this query, but for observers
like ours they are not aware of such URL-specific parameter, so we need:
1) We should always have a timeout and set some default value to it
2) Investigate why metrics/snapshot could take such a long time to complete
under load, since we don't use history for these statistics and the values are
just some atomic read.
> Improve metrics/snapshot performace
> -----------------------------------
>
> Key: MESOS-4740
> URL: https://issues.apache.org/jira/browse/MESOS-4740
> Project: Mesos
> Issue Type: Task
> Reporter: Cong Wang
> Assignee: Cong Wang
>
> [~drobinson] noticed retrieving metrics/snapshot statistics could be very
> inefficient.
> {noformat}
> [user@server ~]$ time curl -s localhost:5050/metrics/snapshot
> real 0m35.654s
> user 0m0.019s
> sys 0m0.011s
> {noformat}
> MESOS-1287 introduces a timeout parameter for this query, but for
> metric-collectors like ours they are not aware of such URL-specific
> parameter, so we need:
> 1) We should always have a timeout and set some default value to it
> 2) Investigate why metrics/snapshot could take such a long time to complete
> under load, since we don't use history for these statistics and the values
> are just some atomic read.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)