sri krishna created MESOS-8731:
----------------------------------
Summary: mesos master APIs become latent
Key: MESOS-8731
URL: https://issues.apache.org/jira/browse/MESOS-8731
Project: Mesos
Issue Type: Bug
Components: master
Affects Versions: 1.5.0, 1.4.0
Reporter: sri krishna
Over a period of time one of the UI API call to the master becomes latent.
Normally the request that takes less than a second takes up to 20 seconds
during peak. A lot of the dev team access the UI for logs.
Below are my observations :
In mesos "0.28.1-2.0.20.ubuntu1404"
################################################################
# ab -n 1000 -c 10
"http://mesos-master1.mesos.bla.net:5050/metrics/snapshot?jsonp=angular.callbacks._4g"
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking mesos-master1.mesos.bla.net (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests
Server Software:
Server Hostname: mesos-master1.mesos.bla.net
Server Port: 5050
Document Path: /metrics/snapshot?jsonp=angular.callbacks._4g
Document Length: 3197 bytes
Concurrency Level: 10
Time taken for tests: 501.010 seconds
Complete requests: 1000
Failed requests: 954
(Connect: 0, Receive: 0, Length: 954, Exceptions: 0)
Total transferred: 3304510 bytes
HTML transferred: 3195510 bytes
Requests per second: 2.00 [#/sec] (mean)
Time per request: 5010.104 [ms] (mean)
Time per request: 501.010 [ms] (mean, across all concurrent requests)
Transfer rate: 6.44 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 321 4987 286.4 5007 5508
Waiting: 321 4987 286.4 5007 5508
Total: 321 4988 286.4 5007 5508
Percentage of the requests served within a certain time (ms)
50% 5007
66% 5007
75% 5008
80% 5008
90% 5008
95% 5009
98% 5010
99% 5506
100% 5508 (longest request)
################################################################
In mesos 1.4 and 1.5 (versions 1.4.0-2.0.1 and 1.5.0-2.0.1) the response of
these APIs is quite high.
################################################################
# ab -n 1000 -c 10
"http://mesos-master3.stage.bla.net:5050/metrics/snapshot?jsonp=angular.callbacks._4g"
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking mesos-master3.stage.bla.net (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
^C
Server Software:
Server Hostname: mesos-master3.stage.bla.net
Server Port: 5050
Document Path: /metrics/snapshot?jsonp=angular.callbacks._4g
Document Length: 6596 bytes
Concurrency Level: 10
Time taken for tests: 1405.182 seconds
Complete requests: 582
Failed requests: 580
(Connect: 0, Receive: 0, Length: 580, Exceptions: 0)
Total transferred: 3909986 bytes
HTML transferred: 3846548 bytes
Requests per second: 0.41 [#/sec] (mean)
Time per request: 24144.024 [ms] (mean)
Time per request: 2414.402 [ms] (mean, across all concurrent requests)
Transfer rate: 2.72 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 15284 24058 2600.7 23937 31740
Waiting: 15284 24058 2600.7 23937 31740
Total: 15284 24059 2600.7 23938 31740
Percentage of the requests served within a certain time (ms)
50% 23938
66% 25074
75% 25729
80% 26465
90% 27605
95% 28215
98% 29685
99% 30595
100% 31740 (longest request)
################################################################
I think this is causing the others APIs like "/master/slaves/ and "/metrics" to
become latent.
At this point we are forcing a re-elect of the the master to bring the times
down. What can I do to bring this times down? The load on the box is quite
less. The load average does not cross 2 on a 8 core box.
Let me know if any further info is required.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)