[
https://issues.apache.org/jira/browse/AURORA-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014305#comment-14014305
]
David McLaughlin edited comment on AURORA-458 at 5/30/14 10:31 PM:
-------------------------------------------------------------------
So these profile runs show conclusively that GzipStream is the cause.
This is timed output from a local run with no network latency:
{code}
$ time curl -s 'http://localhost:8081/api' -H 'Accept-Encoding:
gzip,deflate,sdch' --data-binary
'[1,"getTasksStatus",1,0,{"1":{"rec":{"8":{"rec":{"1":{"str":"mesos"}}},"9":{"str":"test"},"2":{"str":"bigJob"}}}}]'
--compressed > /tmp/results
real 0m1.530s
user 0m0.014s
sys 0m0.011s
$ time curl -s 'http://localhost:8081/api' -H 'Origin: http://localhost:8081'
--data-binary
'[1,"getTasksStatus",1,0,{"1":{"rec":{"8":{"rec":{"1":{"str":"mesos"}}},"9":{"str":"test"},"2":{"str":"bigJob"}}}}]'
> /tmp/blah
real 0m0.297s
user 0m0.007s
sys 0m0.015s
{code}
As you can see, without compression it is 5x faster.
With actual network latency (and a real production job with a much bigger
payload - 10MB vs 3MB on local):
{code}
$ time curl 'https://internal-scheduler/api' -H 'Accept-Encoding:
gzip,deflate,sdch' --data-binary
'[1,"getTasksStatus",1,0,{"1":{"rec":{"8":{"rec":{"1":{"str":"test"}}},"9":{"str":"prod"},"2":{"str":"bigJob"}}}}]'
--compressed > /tmp/results
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 305k 100 305k 100 124 63172 25 0:00:04 0:00:04 --:--:-- 81652
real 0m4.957s
user 0m0.038s
sys 0m0.024s
$ time curl 'https://internal-scheduler/api' --data-binary
'[1,"getTasksStatus",1,0,{"1":{"rec":{"8":{"rec":{"1":{"str":"test"}}},"9":{"str":"prod"},"2":{"str":"bigJob"}}}}]'
> /tmp/results
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 10.3M 100 10.3M 100 124 3670k 42 0:00:02 0:00:02 --:--:-- 3684k
real 0m2.904s
user 0m0.192s
sys 0m0.083s
{code}
Still nearly twice as fast. So we should remove on the fly gzip compression for
dynamic content.
was (Author: davmclau):
So these profile runs show conclusively that GzipStream is the cause.
This is timed output from a local run with no network latency:
{code}
$ time curl -s 'http://localhost:8081/api' -H 'Accept-Encoding:
gzip,deflate,sdch' --data-binary
'[1,"getTasksStatus",1,0,{"1":{"rec":{"8":{"rec":{"1":{"str":"mesos"}}},"9":{"str":"test"},"2":{"str":"bigJob"}}}}]'
--compressed > /tmp/results
real 0m1.530s
user 0m0.014s
sys 0m0.011s
$ time curl -s 'http://localhost:8081/api' -H 'Origin: http://localhost:8081'
--data-binary
'[1,"getTasksStatus",1,0,{"1":{"rec":{"8":{"rec":{"1":{"str":"mesos"}}},"9":{"str":"test"},"2":{"str":"bigJob"}}}}]'
> /tmp/blah
real 0m0.297s
user 0m0.007s
sys 0m0.015s
{code}
As you can see, without compression it is 5x faster.
With actual network latency (and a real production job with a much bigger
payload - 10MB vs 3MB on local):
{code}
$ time curl 'https://internal-scheduler/api' -H 'Accept-Encoding:
gzip,deflate,sdch' --data-binary
'[1,"getTasksStatus",1,0,{"1":{"rec":{"8":{"rec":{"1":{"str":"test"}}},"9":{"str":"prod"},"2":{"str":"bigJob"}}}}]'
--compressed > /tmp/results
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 305k 100 305k 100 124 63172 25 0:00:04 0:00:04 --:--:-- 81652
real 0m4.957s
user 0m0.038s
sys 0m0.024s
$ time curl 'https://scheduler-prod-mesos.service.smf1.twitter.biz/api'
--data-binary
'[1,"getTasksStatus",1,0,{"1":{"rec":{"8":{"rec":{"1":{"str":"test"}}},"9":{"str":"prod"},"2":{"str":"bigJob"}}}}]'
> /tmp/results
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 10.3M 100 10.3M 100 124 3670k 42 0:00:02 0:00:02 --:--:-- 3684k
real 0m2.904s
user 0m0.192s
sys 0m0.083s
{code}
Still nearly twice as fast. So we should remove on the fly gzip compression for
dynamic content.
> Web interface has become slow, especially the job page
> ------------------------------------------------------
>
> Key: AURORA-458
> URL: https://issues.apache.org/jira/browse/AURORA-458
> Project: Aurora
> Issue Type: Bug
> Components: UI
> Reporter: Bill Farner
> Assignee: David McLaughlin
> Priority: Critical
> Attachments: Screen Shot 2014-05-22 at 11.42.24 AM.png, Screen Shot
> 2014-05-22 at 11.44.27 AM.png, scheduler-profile-curl.csv,
> scheduler-profile-curl.png, scheduler-profile.csv, scheduler-profile.png
>
>
> The web interface is noticeably more sluggish since the revamp. This is most
> noticeable for large jobs, where the job page may display a blank page for
> several seconds before showing anything useful. We need to adapt the API to
> reduce the amount of data fetched to render these pages.
--
This message was sent by Atlassian JIRA
(v6.2#6252)