Re: [webkit-dev] PreloadScanner aggressiveness

2010-01-11 Thread Antti Koivisto
On Fri, Jan 8, 2010 at 3:00 AM, Mike Belshe m...@belshe.com wrote:

Nice testing!

But for HTTP; the key seems to the pre-rendering-ready escape hatch in
 DocLoader::preload.  Removing this gives me most all of the benefit.  The
 comment says it pretty clearly:  Don't preload images or body resources
 before we have something to draw. This prevents preloads from body delaying
 first display when bandwidth is limited.  For SPDY, there is more benefit
 by continuing to preparse aggressively - I suspect this is due to the finer
 grained prioritization where it can continue to send requests up without
 impacting the clogged downlink channel.


Yeah, this is currently really optimized for best first display, not for
total load time.

In my testing that escape hatch was pretty important for first display if
the case was bandwidth limited (several seconds on some major sites on 3G).
It is not surprising that it may somewhat hurt total load time in high
bandwidth/high latency case.

Ideally we would have per-site estimate of the current bandwidth and latency
values available so we could tune things like this dynamically.

Any testing of changes here should consider first display times too, not
just total load time.


   antti
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


[webkit-dev] PreloadScanner aggressiveness

2010-01-07 Thread Mike Belshe
Hi -

I've been working on SPDY, but I think I may have found a good performance
win for HTTP.  Specifically, if the PreloadScanner, which is responsible for
scanning ahead within an HTML document to find subresources, is throttled
today.  The throttling is intentional and probably sometimes necessary.
 Nonetheless, un-throttling it may lead to a 5-10% performance boost in some
configurations.  I believe Antti is no longer working on this?  Is there
anyone else working in this area that might have data on how aggressive the
PreloadScanner should be?  Below I'll describe some of my tests.

The PreloadScanner throttling happens in a couple of ways.  First, the
PreloadScanner only runs when we're blocked on JavaScript (see
HTMLTokenizer.cpp).  But further, as it discovers resources to be fetched,
it may delay or reject loading the subresource at all due to throttling in
loader.cpp and DocLoader.cpp.  The throttling is very important, depending
on the implementation of the HTTP networking stack, because throwing too
many resources (or the low-priority ones) into the network stack could
adversely affect HTTP load performance.  This latter problem does not impact
my Chromium tests, because the Chromium network stack does its own
prioritization and throttling (not too dissimilar from the work done by
loader.cpp).

*Theory*:
The theory I'm working under is that when the RTT of the network is
sufficiently high, the *best* thing the browser can do is to discover
resources as quickly as possible and pass them to the network layer so that
we can get started with fetching.  This is not speculative - these are
resources which will be required to render the full page.   The SPDY
protocol is designed around this concept - allowing the browser to schedule
all resources it needs to the network (rather than being throttled by
connection limits).  However, even with SPDY enabled, WebKit itself prevents
resource requests from fully flowing to the network layer in 3 ways:
   a) loader.cpp orders requests and defers requests based on the state of
the page load and a number of criteria.
   b) HTMLTokenizer.cpp only looks for resources further in the body when
we're blocked on JS
   c) preload requests are treated specially (docloader.cpp); if they are
discovered too early by the tokenizer, then they are either queued or
discarded.

*Test Case*
Can aggressive preloadscanning (e.g. always preload scan before parsing an
HTML Document) improve page load time?

To test this, I'm calling the PreloadScanner basically as the first part of
HTMLTokenizer::write().  I've then removed all throttling from loader.cpp
and DocLoader.cpp.  I've also instrumented the PreloadScanner to measure its
effectiveness.

*Benchmark Setup*
Windows client (chromium).
Simulated network with 4Mbps download, 1Mbps upload, 100ms RTT, 0% packet
loss.
I run through a set of 25 URLs, loading each 30 times; not recycling any
connections and clearing the cache between each page.
These are running over HTTP; there is no SPDY involved here.

*Results:*
Baseline
(without my changes)UnthrottledNotesAverage PLT2377ms2239ms+5.8% latency
redux.Time spent in the PreloadScanner1160ms4540msAs expected, we spend
about 4x more time in the PreloadScanner. In this test, we loaded 750 pages,
so it is about 6ms per page. My machine is fast, though.Preload Scripts
discovered262194404x more scripts discoveredPreload CSS discovered34810223x
more CSS discoveredPreload Images discovered11952391443x more images
discoveredPreload items throttled99830Preload Complete hits38036950This is
the count of items which were completely preloaded before WebKit even tried
to look them up in the cache. This is pure goodness.Preload Partial hits1708
7230These are partial hits, where the item had already started loading, but
not finished, before WebKit tried to look them up.Preload
Unreferenced42130These
are bad and the count should be zero. I'll try to find them and see if there
isn't a fix - the PreloadScanner is just sometimes finding resources that
are never used. It is likely due to clever JS which changes the DOM.



*Conclusions:*
For this network speed/client processor, more aggressive PreloadScanning
clearly is a win.   More testing is needed for slower machines and other
network types.  I've tested many network types; the aggressive preload
scanning seems to always be either a win or a wash; for very slow network
connections, where we're already at capacity, the extra CPU burning is
basically free.  For super fast networks, with very low RTT, it also appears
to be a wash.  The networks in the middle (including mobile simulations) see
nice gains.

*Next Steps and Questions:*
I'd like to land my changes so that we can continue to gather data.  I can
enable these via macro definitions or I can enable these via dynamic
settings.  I can then try to do more A/B testing.

Are there any existing web pages which the WebKit team would like tested
under these configurations?  I don't see a lot of testing 

Re: [webkit-dev] PreloadScanner aggressiveness

2010-01-07 Thread Maciej Stachowiak


On Jan 7, 2010, at 12:09 PM, Mike Belshe wrote:


Hi -

I've been working on SPDY, but I think I may have found a good  
performance win for HTTP.  Specifically, if the PreloadScanner,  
which is responsible for scanning ahead within an HTML document to  
find subresources, is throttled today.  The throttling is  
intentional and probably sometimes necessary.  Nonetheless, un- 
throttling it may lead to a 5-10% performance boost in some  
configurations.  I believe Antti is no longer working on this?  Is  
there anyone else working in this area that might have data on how  
aggressive the PreloadScanner should be?  Below I'll describe some  
of my tests.


The PreloadScanner throttling happens in a couple of ways.  First,  
the PreloadScanner only runs when we're blocked on JavaScript (see  
HTMLTokenizer.cpp).  But further, as it discovers resources to be  
fetched, it may delay or reject loading the subresource at all due  
to throttling in loader.cpp and DocLoader.cpp.  The throttling is  
very important, depending on the implementation of the HTTP  
networking stack, because throwing too many resources (or the low- 
priority ones) into the network stack could adversely affect HTTP  
load performance.  This latter problem does not impact my Chromium  
tests, because the Chromium network stack does its own  
prioritization and throttling (not too dissimilar from the work done  
by loader.cpp).


The reason we do this is to prevent head-of-line blocking by low- 
priority resources inside the network stack (mainly considering how  
CFNetwork / NSURLConnection works).




Theory:
The theory I'm working under is that when the RTT of the network is  
sufficiently high, the *best* thing the browser can do is to  
discover resources as quickly as possible and pass them to the  
network layer so that we can get started with fetching.  This is not  
speculative - these are resources which will be required to render  
the full page.   The SPDY protocol is designed around this concept -  
allowing the browser to schedule all resources it needs to the  
network (rather than being throttled by connection limits).   
However, even with SPDY enabled, WebKit itself prevents resource  
requests from fully flowing to the network layer in 3 ways:
   a) loader.cpp orders requests and defers requests based on the  
state of the page load and a number of criteria.
   b) HTMLTokenizer.cpp only looks for resources further in the body  
when we're blocked on JS
   c) preload requests are treated specially (docloader.cpp); if  
they are discovered too early by the tokenizer, then they are either  
queued or discarded.


I think your theory is correct when SPDY is enabled, and possibly when  
using HTTP with pipelining. It may be true to a lesser extent with non- 
pipelining HTTP implementations when the network stack does its own  
prioritization and throttling, by reducing latency in getting the  
request to the network stack. This is especially so when issuing a  
network request to the network stack may involve significant latency  
due to IPC or cross-thread communication or the like.




Test Case
Can aggressive preloadscanning (e.g. always preload scan before  
parsing an HTML Document) improve page load time?


To test this, I'm calling the PreloadScanner basically as the first  
part of HTMLTokenizer::write().  I've then removed all throttling  
from loader.cpp and DocLoader.cpp.  I've also instrumented the  
PreloadScanner to measure its effectiveness.


Benchmark Setup
Windows client (chromium).
Simulated network with 4Mbps download, 1Mbps upload, 100ms RTT, 0%  
packet loss.
I run through a set of 25 URLs, loading each 30 times; not recycling  
any connections and clearing the cache between each page.

These are running over HTTP; there is no SPDY involved here.


I'm interested in the following:

- What kind of results do you get in Safari?
- How much of this effect is due to more aggressive preload scanning  
and how much is due to disabling throttling? Since the test includes  
multiple logically indpendent changes, it is hard to tell which are  
the ones that had an effect.




Results:
Baseline
(without my changes)Unthrottled Notes
Average PLT 2377ms  2239ms  +5.8% latency redux.
Time spent in the PreloadScanner	1160ms	4540ms	As expected, we spend  
about 4x more time in the PreloadScanner. In this test, we loaded  
750 pages, so it is about 6ms per page. My machine is fast, though.

Preload Scripts discovered  262194404x more scripts discovered
Preload CSS discovered  348 10223x more CSS discovered
Preload Images discovered   11952   39144   3x more images discovered
Preload items throttled 99830   
Preload Complete hits	3803	6950	This is the count of items which  
were completely preloaded before WebKit even tried to look them up  
in the cache. This is pure goodness.
Preload Partial hits	1708	7230	These are partial hits, where the  
item had already started 

Re: [webkit-dev] PreloadScanner aggressiveness

2010-01-07 Thread Joe Mason
I don't think every port should be required to implement prioritization
and throttling itself - that's just duplication of effort.  Maybe
there's a good middle-ground, where PreloadScanner is run more often but
still does the priority sorting?



Joe



From: webkit-dev-boun...@lists.webkit.org
[mailto:webkit-dev-boun...@lists.webkit.org] On Behalf Of Mike Belshe
Sent: Thursday, January 07, 2010 3:09 PM
To: webkit-dev@lists.webkit.org
Subject: [webkit-dev] PreloadScanner aggressiveness



Hi -


I've been working on SPDY, but I think I may have found a good
performance win for HTTP.  Specifically, if the PreloadScanner, which is
responsible for scanning ahead within an HTML document to find
subresources, is throttled today.  The throttling is intentional and
probably sometimes necessary.  Nonetheless, un-throttling it may lead to
a 5-10% performance boost in some configurations.  I believe Antti is no
longer working on this?  Is there anyone else working in this area that
might have data on how aggressive the PreloadScanner should be?  Below
I'll describe some of my tests.



The PreloadScanner throttling happens in a couple of ways.  First, the
PreloadScanner only runs when we're blocked on JavaScript (see
HTMLTokenizer.cpp).  But further, as it discovers resources to be
fetched, it may delay or reject loading the subresource at all due to
throttling in loader.cpp and DocLoader.cpp.  The throttling is very
important, depending on the implementation of the HTTP networking stack,
because throwing too many resources (or the low-priority ones) into the
network stack could adversely affect HTTP load performance.  This latter
problem does not impact my Chromium tests, because the Chromium network
stack does its own prioritization and throttling (not too dissimilar
from the work done by loader.cpp).



Theory:

The theory I'm working under is that when the RTT of the network is
sufficiently high, the *best* thing the browser can do is to discover
resources as quickly as possible and pass them to the network layer so
that we can get started with fetching.  This is not speculative - these
are resources which will be required to render the full page.   The SPDY
protocol is designed around this concept - allowing the browser to
schedule all resources it needs to the network (rather than being
throttled by connection limits).  However, even with SPDY enabled,
WebKit itself prevents resource requests from fully flowing to the
network layer in 3 ways:

   a) loader.cpp orders requests and defers requests based on the state
of the page load and a number of criteria.

   b) HTMLTokenizer.cpp only looks for resources further in the body
when we're blocked on JS

   c) preload requests are treated specially (docloader.cpp); if they
are discovered too early by the tokenizer, then they are either queued
or discarded.



Test Case

Can aggressive preloadscanning (e.g. always preload scan before parsing
an HTML Document) improve page load time?



To test this, I'm calling the PreloadScanner basically as the first part
of HTMLTokenizer::write().  I've then removed all throttling from
loader.cpp and DocLoader.cpp.  I've also instrumented the PreloadScanner
to measure its effectiveness.



Benchmark Setup

Windows client (chromium).

Simulated network with 4Mbps download, 1Mbps upload, 100ms RTT, 0%
packet loss.

I run through a set of 25 URLs, loading each 30 times; not recycling any
connections and clearing the cache between each page.

These are running over HTTP; there is no SPDY involved here.



Results:

Baseline
(without my changes)

Unthrottled

Notes

Average PLT

2377ms

2239ms

+5.8% latency redux.

Time spent in the PreloadScanner

1160ms

4540ms

As expected, we spend about 4x more time in the PreloadScanner. In this
test, we loaded 750 pages, so it is about 6ms per page. My machine is
fast, though.

Preload Scripts discovered

2621

9440

4x more scripts discovered

Preload CSS discovered

348

1022

3x more CSS discovered

Preload Images discovered

11952

39144

3x more images discovered

Preload items throttled

9983

0


Preload Complete hits

3803

6950

This is the count of items which were completely preloaded before WebKit
even tried to look them up in the cache. This is pure goodness.

Preload Partial hits

1708

7230

These are partial hits, where the item had already started loading, but
not finished, before WebKit tried to look them up.

Preload Unreferenced

42

130

These are bad and the count should be zero. I'll try to find them and
see if there isn't a fix - the PreloadScanner is just sometimes finding
resources that are never used. It is likely due to clever JS which
changes the DOM.







Conclusions:

For this network speed/client processor, more aggressive PreloadScanning
clearly is a win.   More testing is needed for slower machines and other
network types.  I've tested many network types; the aggressive preload
scanning seems to always be either a win or a wash; for very slow

Re: [webkit-dev] PreloadScanner aggressiveness

2010-01-07 Thread Mike Belshe
On Thu, Jan 7, 2010 at 12:49 PM, Maciej Stachowiak m...@apple.com wrote:


 On Jan 7, 2010, at 12:09 PM, Mike Belshe wrote:

 Hi -

 I've been working on SPDY, but I think I may have found a good performance
 win for HTTP.  Specifically, if the PreloadScanner, which is responsible for
 scanning ahead within an HTML document to find subresources, is throttled
 today.  The throttling is intentional and probably sometimes necessary.
  Nonetheless, un-throttling it may lead to a 5-10% performance boost in some
 configurations.  I believe Antti is no longer working on this?  Is there
 anyone else working in this area that might have data on how aggressive the
 PreloadScanner should be?  Below I'll describe some of my tests.

 The PreloadScanner throttling happens in a couple of ways.  First, the
 PreloadScanner only runs when we're blocked on JavaScript (see
 HTMLTokenizer.cpp).  But further, as it discovers resources to be fetched,
 it may delay or reject loading the subresource at all due to throttling in
 loader.cpp and DocLoader.cpp.  The throttling is very important, depending
 on the implementation of the HTTP networking stack, because throwing too
 many resources (or the low-priority ones) into the network stack could
 adversely affect HTTP load performance.  This latter problem does not impact
 my Chromium tests, because the Chromium network stack does its own
 prioritization and throttling (not too dissimilar from the work done by
 loader.cpp).


 The reason we do this is to prevent head-of-line blocking by low-priority
 resources inside the network stack (mainly considering how CFNetwork /
 NSURLConnection works).


Right - understood.




 *Theory*:
 The theory I'm working under is that when the RTT of the network is
 sufficiently high, the *best* thing the browser can do is to discover
 resources as quickly as possible and pass them to the network layer so that
 we can get started with fetching.  This is not speculative - these are
 resources which will be required to render the full page.   The SPDY
 protocol is designed around this concept - allowing the browser to schedule
 all resources it needs to the network (rather than being throttled by
 connection limits).  However, even with SPDY enabled, WebKit itself prevents
 resource requests from fully flowing to the network layer in 3 ways:
a) loader.cpp orders requests and defers requests based on the state of
 the page load and a number of criteria.
b) HTMLTokenizer.cpp only looks for resources further in the body when
 we're blocked on JS
c) preload requests are treated specially (docloader.cpp); if they are
 discovered too early by the tokenizer, then they are either queued or
 discarded.


 I think your theory is correct when SPDY is enabled, and possibly when
 using HTTP with pipelining. It may be true to a lesser extent with
 non-pipelining HTTP implementations when the network stack does its own
 prioritization and throttling, by reducing latency in getting the request to
 the network stack.


right.


 This is especially so when issuing a network request to the network stack
 may involve significant latency due to IPC or cross-thread communication or
 the like.


I hadn't considered IPC or cross thread latencies.  When I've measured these
in the past they are very very low.  One problem with the single-threaded
nature of our preloader and parser right now is that if the HTMLTokenizer is
in the middle of executing JS code, we're not doing anything to scan for
preloads; tons of data can be flowing in off the network which we're
oblivious to.  I'm not trying to change this for now, though, it's much more
involved, I think, due to thread safety requirements for the webcore cache.




 *Test Case*
 Can aggressive preloadscanning (e.g. always preload scan before parsing an
 HTML Document) improve page load time?

 To test this, I'm calling the PreloadScanner basically as the first part of
 HTMLTokenizer::write().  I've then removed all throttling from loader.cpp
 and DocLoader.cpp.  I've also instrumented the PreloadScanner to measure its
 effectiveness.

 *Benchmark Setup*
 Windows client (chromium).
 Simulated network with 4Mbps download, 1Mbps upload, 100ms RTT, 0% packet
 loss.
 I run through a set of 25 URLs, loading each 30 times; not recycling any
 connections and clearing the cache between each page.
 These are running over HTTP; there is no SPDY involved here.


 I'm interested in the following:

 - What kind of results do you get in Safari?


I've not done much benchmarking in Safari; do you have a good way to do
this?  Is there something I can read about or tools I can use?

For chromium, I use the benchmarking extension which lets me run through
lots of pages quickly.



 - How much of this effect is due to more aggressive preload scanning and
 how much is due to disabling throttling? Since the test includes multiple
 logically indpendent changes, it is hard to tell which are the ones that had
 an effect.


Great 

Re: [webkit-dev] PreloadScanner aggressiveness

2010-01-07 Thread Mike Belshe
On Thu, Jan 7, 2010 at 12:52 PM, Joe Mason jma...@rim.com wrote:

  I don’t think every port should be required to implement prioritization
 and throttling itself – that’s just duplication of effort.

I agree.  I wasn't thinking of turning this on globally; rather thinking
about how to turn it on selectively for ports that want it.

Mike



 Maybe there’s a good middle-ground, where PreloadScanner is run more often
 but still does the priority sorting?



 Joe



 *From:* webkit-dev-boun...@lists.webkit.org [mailto:
 webkit-dev-boun...@lists.webkit.org] *On Behalf Of *Mike Belshe
 *Sent:* Thursday, January 07, 2010 3:09 PM
 *To:* webkit-dev@lists.webkit.org
 *Subject:* [webkit-dev] PreloadScanner aggressiveness



 Hi -


 I've been working on SPDY, but I think I may have found a good performance
 win for HTTP.  Specifically, if the PreloadScanner, which is responsible for
 scanning ahead within an HTML document to find subresources, is throttled
 today.  The throttling is intentional and probably sometimes necessary.
  Nonetheless, un-throttling it may lead to a 5-10% performance boost in some
 configurations.  I believe Antti is no longer working on this?  Is there
 anyone else working in this area that might have data on how aggressive the
 PreloadScanner should be?  Below I'll describe some of my tests.



 The PreloadScanner throttling happens in a couple of ways.  First, the
 PreloadScanner only runs when we're blocked on JavaScript (see
 HTMLTokenizer.cpp).  But further, as it discovers resources to be fetched,
 it may delay or reject loading the subresource at all due to throttling in
 loader.cpp and DocLoader.cpp.  The throttling is very important, depending
 on the implementation of the HTTP networking stack, because throwing too
 many resources (or the low-priority ones) into the network stack could
 adversely affect HTTP load performance.  This latter problem does not impact
 my Chromium tests, because the Chromium network stack does its own
 prioritization and throttling (not too dissimilar from the work done by
 loader.cpp).



 *Theory*:

 The theory I'm working under is that when the RTT of the network is
 sufficiently high, the *best* thing the browser can do is to discover
 resources as quickly as possible and pass them to the network layer so that
 we can get started with fetching.  This is not speculative - these are
 resources which will be required to render the full page.   The SPDY
 protocol is designed around this concept - allowing the browser to schedule
 all resources it needs to the network (rather than being throttled by
 connection limits).  However, even with SPDY enabled, WebKit itself prevents
 resource requests from fully flowing to the network layer in 3 ways:

a) loader.cpp orders requests and defers requests based on the state of
 the page load and a number of criteria.

b) HTMLTokenizer.cpp only looks for resources further in the body when
 we're blocked on JS

c) preload requests are treated specially (docloader.cpp); if they are
 discovered too early by the tokenizer, then they are either queued or
 discarded.



 *Test Case*

 Can aggressive preloadscanning (e.g. always preload scan before parsing an
 HTML Document) improve page load time?



 To test this, I'm calling the PreloadScanner basically as the first part of
 HTMLTokenizer::write().  I've then removed all throttling from loader.cpp
 and DocLoader.cpp.  I've also instrumented the PreloadScanner to measure its
 effectiveness.



 *Benchmark Setup*

 Windows client (chromium).

 Simulated network with 4Mbps download, 1Mbps upload, 100ms RTT, 0% packet
 loss.

 I run through a set of 25 URLs, loading each 30 times; not recycling any
 connections and clearing the cache between each page.

 These are running over HTTP; there is no SPDY involved here.



 *Results:*

 *Baseline
 (without my changes)*

 *Unthrottled*

 *Notes*

 Average PLT

 2377ms

 2239ms

 +5.8% latency redux.

 Time spent in the PreloadScanner

 1160ms

 4540ms

 As expected, we spend about 4x more time in the PreloadScanner. In this
 test, we loaded 750 pages, so it is about 6ms per page. My machine is fast,
 though.

 Preload Scripts discovered

 2621

 9440

 4x more scripts discovered

 Preload CSS discovered

 348

 1022

 3x more CSS discovered

 Preload Images discovered

 11952

 39144

 3x more images discovered

 Preload items throttled

 9983

 0

 Preload Complete hits

 3803

 6950

 This is the count of items which were completely preloaded before WebKit
 even tried to look them up in the cache. This is pure goodness.

 Preload Partial hits

 1708

 7230

 These are partial hits, where the item had already started loading, but not
 finished, before WebKit tried to look them up.

 Preload Unreferenced

 42

 130

 These are bad and the count should be zero. I'll try to find them and see
 if there isn't a fix - the PreloadScanner is just sometimes finding
 resources that are never used. It is likely due to clever JS