Re: [webkit-dev] PreloadScanner aggressiveness
On Fri, Jan 8, 2010 at 3:00 AM, Mike Belshe m...@belshe.com wrote: Nice testing! But for HTTP; the key seems to the pre-rendering-ready escape hatch in DocLoader::preload. Removing this gives me most all of the benefit. The comment says it pretty clearly: Don't preload images or body resources before we have something to draw. This prevents preloads from body delaying first display when bandwidth is limited. For SPDY, there is more benefit by continuing to preparse aggressively - I suspect this is due to the finer grained prioritization where it can continue to send requests up without impacting the clogged downlink channel. Yeah, this is currently really optimized for best first display, not for total load time. In my testing that escape hatch was pretty important for first display if the case was bandwidth limited (several seconds on some major sites on 3G). It is not surprising that it may somewhat hurt total load time in high bandwidth/high latency case. Ideally we would have per-site estimate of the current bandwidth and latency values available so we could tune things like this dynamically. Any testing of changes here should consider first display times too, not just total load time. antti ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
[webkit-dev] PreloadScanner aggressiveness
Hi - I've been working on SPDY, but I think I may have found a good performance win for HTTP. Specifically, if the PreloadScanner, which is responsible for scanning ahead within an HTML document to find subresources, is throttled today. The throttling is intentional and probably sometimes necessary. Nonetheless, un-throttling it may lead to a 5-10% performance boost in some configurations. I believe Antti is no longer working on this? Is there anyone else working in this area that might have data on how aggressive the PreloadScanner should be? Below I'll describe some of my tests. The PreloadScanner throttling happens in a couple of ways. First, the PreloadScanner only runs when we're blocked on JavaScript (see HTMLTokenizer.cpp). But further, as it discovers resources to be fetched, it may delay or reject loading the subresource at all due to throttling in loader.cpp and DocLoader.cpp. The throttling is very important, depending on the implementation of the HTTP networking stack, because throwing too many resources (or the low-priority ones) into the network stack could adversely affect HTTP load performance. This latter problem does not impact my Chromium tests, because the Chromium network stack does its own prioritization and throttling (not too dissimilar from the work done by loader.cpp). *Theory*: The theory I'm working under is that when the RTT of the network is sufficiently high, the *best* thing the browser can do is to discover resources as quickly as possible and pass them to the network layer so that we can get started with fetching. This is not speculative - these are resources which will be required to render the full page. The SPDY protocol is designed around this concept - allowing the browser to schedule all resources it needs to the network (rather than being throttled by connection limits). However, even with SPDY enabled, WebKit itself prevents resource requests from fully flowing to the network layer in 3 ways: a) loader.cpp orders requests and defers requests based on the state of the page load and a number of criteria. b) HTMLTokenizer.cpp only looks for resources further in the body when we're blocked on JS c) preload requests are treated specially (docloader.cpp); if they are discovered too early by the tokenizer, then they are either queued or discarded. *Test Case* Can aggressive preloadscanning (e.g. always preload scan before parsing an HTML Document) improve page load time? To test this, I'm calling the PreloadScanner basically as the first part of HTMLTokenizer::write(). I've then removed all throttling from loader.cpp and DocLoader.cpp. I've also instrumented the PreloadScanner to measure its effectiveness. *Benchmark Setup* Windows client (chromium). Simulated network with 4Mbps download, 1Mbps upload, 100ms RTT, 0% packet loss. I run through a set of 25 URLs, loading each 30 times; not recycling any connections and clearing the cache between each page. These are running over HTTP; there is no SPDY involved here. *Results:* Baseline (without my changes)UnthrottledNotesAverage PLT2377ms2239ms+5.8% latency redux.Time spent in the PreloadScanner1160ms4540msAs expected, we spend about 4x more time in the PreloadScanner. In this test, we loaded 750 pages, so it is about 6ms per page. My machine is fast, though.Preload Scripts discovered262194404x more scripts discoveredPreload CSS discovered34810223x more CSS discoveredPreload Images discovered11952391443x more images discoveredPreload items throttled99830Preload Complete hits38036950This is the count of items which were completely preloaded before WebKit even tried to look them up in the cache. This is pure goodness.Preload Partial hits1708 7230These are partial hits, where the item had already started loading, but not finished, before WebKit tried to look them up.Preload Unreferenced42130These are bad and the count should be zero. I'll try to find them and see if there isn't a fix - the PreloadScanner is just sometimes finding resources that are never used. It is likely due to clever JS which changes the DOM. *Conclusions:* For this network speed/client processor, more aggressive PreloadScanning clearly is a win. More testing is needed for slower machines and other network types. I've tested many network types; the aggressive preload scanning seems to always be either a win or a wash; for very slow network connections, where we're already at capacity, the extra CPU burning is basically free. For super fast networks, with very low RTT, it also appears to be a wash. The networks in the middle (including mobile simulations) see nice gains. *Next Steps and Questions:* I'd like to land my changes so that we can continue to gather data. I can enable these via macro definitions or I can enable these via dynamic settings. I can then try to do more A/B testing. Are there any existing web pages which the WebKit team would like tested under these configurations? I don't see a lot of testing
Re: [webkit-dev] PreloadScanner aggressiveness
On Jan 7, 2010, at 12:09 PM, Mike Belshe wrote: Hi - I've been working on SPDY, but I think I may have found a good performance win for HTTP. Specifically, if the PreloadScanner, which is responsible for scanning ahead within an HTML document to find subresources, is throttled today. The throttling is intentional and probably sometimes necessary. Nonetheless, un- throttling it may lead to a 5-10% performance boost in some configurations. I believe Antti is no longer working on this? Is there anyone else working in this area that might have data on how aggressive the PreloadScanner should be? Below I'll describe some of my tests. The PreloadScanner throttling happens in a couple of ways. First, the PreloadScanner only runs when we're blocked on JavaScript (see HTMLTokenizer.cpp). But further, as it discovers resources to be fetched, it may delay or reject loading the subresource at all due to throttling in loader.cpp and DocLoader.cpp. The throttling is very important, depending on the implementation of the HTTP networking stack, because throwing too many resources (or the low- priority ones) into the network stack could adversely affect HTTP load performance. This latter problem does not impact my Chromium tests, because the Chromium network stack does its own prioritization and throttling (not too dissimilar from the work done by loader.cpp). The reason we do this is to prevent head-of-line blocking by low- priority resources inside the network stack (mainly considering how CFNetwork / NSURLConnection works). Theory: The theory I'm working under is that when the RTT of the network is sufficiently high, the *best* thing the browser can do is to discover resources as quickly as possible and pass them to the network layer so that we can get started with fetching. This is not speculative - these are resources which will be required to render the full page. The SPDY protocol is designed around this concept - allowing the browser to schedule all resources it needs to the network (rather than being throttled by connection limits). However, even with SPDY enabled, WebKit itself prevents resource requests from fully flowing to the network layer in 3 ways: a) loader.cpp orders requests and defers requests based on the state of the page load and a number of criteria. b) HTMLTokenizer.cpp only looks for resources further in the body when we're blocked on JS c) preload requests are treated specially (docloader.cpp); if they are discovered too early by the tokenizer, then they are either queued or discarded. I think your theory is correct when SPDY is enabled, and possibly when using HTTP with pipelining. It may be true to a lesser extent with non- pipelining HTTP implementations when the network stack does its own prioritization and throttling, by reducing latency in getting the request to the network stack. This is especially so when issuing a network request to the network stack may involve significant latency due to IPC or cross-thread communication or the like. Test Case Can aggressive preloadscanning (e.g. always preload scan before parsing an HTML Document) improve page load time? To test this, I'm calling the PreloadScanner basically as the first part of HTMLTokenizer::write(). I've then removed all throttling from loader.cpp and DocLoader.cpp. I've also instrumented the PreloadScanner to measure its effectiveness. Benchmark Setup Windows client (chromium). Simulated network with 4Mbps download, 1Mbps upload, 100ms RTT, 0% packet loss. I run through a set of 25 URLs, loading each 30 times; not recycling any connections and clearing the cache between each page. These are running over HTTP; there is no SPDY involved here. I'm interested in the following: - What kind of results do you get in Safari? - How much of this effect is due to more aggressive preload scanning and how much is due to disabling throttling? Since the test includes multiple logically indpendent changes, it is hard to tell which are the ones that had an effect. Results: Baseline (without my changes)Unthrottled Notes Average PLT 2377ms 2239ms +5.8% latency redux. Time spent in the PreloadScanner 1160ms 4540ms As expected, we spend about 4x more time in the PreloadScanner. In this test, we loaded 750 pages, so it is about 6ms per page. My machine is fast, though. Preload Scripts discovered 262194404x more scripts discovered Preload CSS discovered 348 10223x more CSS discovered Preload Images discovered 11952 39144 3x more images discovered Preload items throttled 99830 Preload Complete hits 3803 6950 This is the count of items which were completely preloaded before WebKit even tried to look them up in the cache. This is pure goodness. Preload Partial hits 1708 7230 These are partial hits, where the item had already started
Re: [webkit-dev] PreloadScanner aggressiveness
I don't think every port should be required to implement prioritization and throttling itself - that's just duplication of effort. Maybe there's a good middle-ground, where PreloadScanner is run more often but still does the priority sorting? Joe From: webkit-dev-boun...@lists.webkit.org [mailto:webkit-dev-boun...@lists.webkit.org] On Behalf Of Mike Belshe Sent: Thursday, January 07, 2010 3:09 PM To: webkit-dev@lists.webkit.org Subject: [webkit-dev] PreloadScanner aggressiveness Hi - I've been working on SPDY, but I think I may have found a good performance win for HTTP. Specifically, if the PreloadScanner, which is responsible for scanning ahead within an HTML document to find subresources, is throttled today. The throttling is intentional and probably sometimes necessary. Nonetheless, un-throttling it may lead to a 5-10% performance boost in some configurations. I believe Antti is no longer working on this? Is there anyone else working in this area that might have data on how aggressive the PreloadScanner should be? Below I'll describe some of my tests. The PreloadScanner throttling happens in a couple of ways. First, the PreloadScanner only runs when we're blocked on JavaScript (see HTMLTokenizer.cpp). But further, as it discovers resources to be fetched, it may delay or reject loading the subresource at all due to throttling in loader.cpp and DocLoader.cpp. The throttling is very important, depending on the implementation of the HTTP networking stack, because throwing too many resources (or the low-priority ones) into the network stack could adversely affect HTTP load performance. This latter problem does not impact my Chromium tests, because the Chromium network stack does its own prioritization and throttling (not too dissimilar from the work done by loader.cpp). Theory: The theory I'm working under is that when the RTT of the network is sufficiently high, the *best* thing the browser can do is to discover resources as quickly as possible and pass them to the network layer so that we can get started with fetching. This is not speculative - these are resources which will be required to render the full page. The SPDY protocol is designed around this concept - allowing the browser to schedule all resources it needs to the network (rather than being throttled by connection limits). However, even with SPDY enabled, WebKit itself prevents resource requests from fully flowing to the network layer in 3 ways: a) loader.cpp orders requests and defers requests based on the state of the page load and a number of criteria. b) HTMLTokenizer.cpp only looks for resources further in the body when we're blocked on JS c) preload requests are treated specially (docloader.cpp); if they are discovered too early by the tokenizer, then they are either queued or discarded. Test Case Can aggressive preloadscanning (e.g. always preload scan before parsing an HTML Document) improve page load time? To test this, I'm calling the PreloadScanner basically as the first part of HTMLTokenizer::write(). I've then removed all throttling from loader.cpp and DocLoader.cpp. I've also instrumented the PreloadScanner to measure its effectiveness. Benchmark Setup Windows client (chromium). Simulated network with 4Mbps download, 1Mbps upload, 100ms RTT, 0% packet loss. I run through a set of 25 URLs, loading each 30 times; not recycling any connections and clearing the cache between each page. These are running over HTTP; there is no SPDY involved here. Results: Baseline (without my changes) Unthrottled Notes Average PLT 2377ms 2239ms +5.8% latency redux. Time spent in the PreloadScanner 1160ms 4540ms As expected, we spend about 4x more time in the PreloadScanner. In this test, we loaded 750 pages, so it is about 6ms per page. My machine is fast, though. Preload Scripts discovered 2621 9440 4x more scripts discovered Preload CSS discovered 348 1022 3x more CSS discovered Preload Images discovered 11952 39144 3x more images discovered Preload items throttled 9983 0 Preload Complete hits 3803 6950 This is the count of items which were completely preloaded before WebKit even tried to look them up in the cache. This is pure goodness. Preload Partial hits 1708 7230 These are partial hits, where the item had already started loading, but not finished, before WebKit tried to look them up. Preload Unreferenced 42 130 These are bad and the count should be zero. I'll try to find them and see if there isn't a fix - the PreloadScanner is just sometimes finding resources that are never used. It is likely due to clever JS which changes the DOM. Conclusions: For this network speed/client processor, more aggressive PreloadScanning clearly is a win. More testing is needed for slower machines and other network types. I've tested many network types; the aggressive preload scanning seems to always be either a win or a wash; for very slow
Re: [webkit-dev] PreloadScanner aggressiveness
On Thu, Jan 7, 2010 at 12:49 PM, Maciej Stachowiak m...@apple.com wrote: On Jan 7, 2010, at 12:09 PM, Mike Belshe wrote: Hi - I've been working on SPDY, but I think I may have found a good performance win for HTTP. Specifically, if the PreloadScanner, which is responsible for scanning ahead within an HTML document to find subresources, is throttled today. The throttling is intentional and probably sometimes necessary. Nonetheless, un-throttling it may lead to a 5-10% performance boost in some configurations. I believe Antti is no longer working on this? Is there anyone else working in this area that might have data on how aggressive the PreloadScanner should be? Below I'll describe some of my tests. The PreloadScanner throttling happens in a couple of ways. First, the PreloadScanner only runs when we're blocked on JavaScript (see HTMLTokenizer.cpp). But further, as it discovers resources to be fetched, it may delay or reject loading the subresource at all due to throttling in loader.cpp and DocLoader.cpp. The throttling is very important, depending on the implementation of the HTTP networking stack, because throwing too many resources (or the low-priority ones) into the network stack could adversely affect HTTP load performance. This latter problem does not impact my Chromium tests, because the Chromium network stack does its own prioritization and throttling (not too dissimilar from the work done by loader.cpp). The reason we do this is to prevent head-of-line blocking by low-priority resources inside the network stack (mainly considering how CFNetwork / NSURLConnection works). Right - understood. *Theory*: The theory I'm working under is that when the RTT of the network is sufficiently high, the *best* thing the browser can do is to discover resources as quickly as possible and pass them to the network layer so that we can get started with fetching. This is not speculative - these are resources which will be required to render the full page. The SPDY protocol is designed around this concept - allowing the browser to schedule all resources it needs to the network (rather than being throttled by connection limits). However, even with SPDY enabled, WebKit itself prevents resource requests from fully flowing to the network layer in 3 ways: a) loader.cpp orders requests and defers requests based on the state of the page load and a number of criteria. b) HTMLTokenizer.cpp only looks for resources further in the body when we're blocked on JS c) preload requests are treated specially (docloader.cpp); if they are discovered too early by the tokenizer, then they are either queued or discarded. I think your theory is correct when SPDY is enabled, and possibly when using HTTP with pipelining. It may be true to a lesser extent with non-pipelining HTTP implementations when the network stack does its own prioritization and throttling, by reducing latency in getting the request to the network stack. right. This is especially so when issuing a network request to the network stack may involve significant latency due to IPC or cross-thread communication or the like. I hadn't considered IPC or cross thread latencies. When I've measured these in the past they are very very low. One problem with the single-threaded nature of our preloader and parser right now is that if the HTMLTokenizer is in the middle of executing JS code, we're not doing anything to scan for preloads; tons of data can be flowing in off the network which we're oblivious to. I'm not trying to change this for now, though, it's much more involved, I think, due to thread safety requirements for the webcore cache. *Test Case* Can aggressive preloadscanning (e.g. always preload scan before parsing an HTML Document) improve page load time? To test this, I'm calling the PreloadScanner basically as the first part of HTMLTokenizer::write(). I've then removed all throttling from loader.cpp and DocLoader.cpp. I've also instrumented the PreloadScanner to measure its effectiveness. *Benchmark Setup* Windows client (chromium). Simulated network with 4Mbps download, 1Mbps upload, 100ms RTT, 0% packet loss. I run through a set of 25 URLs, loading each 30 times; not recycling any connections and clearing the cache between each page. These are running over HTTP; there is no SPDY involved here. I'm interested in the following: - What kind of results do you get in Safari? I've not done much benchmarking in Safari; do you have a good way to do this? Is there something I can read about or tools I can use? For chromium, I use the benchmarking extension which lets me run through lots of pages quickly. - How much of this effect is due to more aggressive preload scanning and how much is due to disabling throttling? Since the test includes multiple logically indpendent changes, it is hard to tell which are the ones that had an effect. Great
Re: [webkit-dev] PreloadScanner aggressiveness
On Thu, Jan 7, 2010 at 12:52 PM, Joe Mason jma...@rim.com wrote: I don’t think every port should be required to implement prioritization and throttling itself – that’s just duplication of effort. I agree. I wasn't thinking of turning this on globally; rather thinking about how to turn it on selectively for ports that want it. Mike Maybe there’s a good middle-ground, where PreloadScanner is run more often but still does the priority sorting? Joe *From:* webkit-dev-boun...@lists.webkit.org [mailto: webkit-dev-boun...@lists.webkit.org] *On Behalf Of *Mike Belshe *Sent:* Thursday, January 07, 2010 3:09 PM *To:* webkit-dev@lists.webkit.org *Subject:* [webkit-dev] PreloadScanner aggressiveness Hi - I've been working on SPDY, but I think I may have found a good performance win for HTTP. Specifically, if the PreloadScanner, which is responsible for scanning ahead within an HTML document to find subresources, is throttled today. The throttling is intentional and probably sometimes necessary. Nonetheless, un-throttling it may lead to a 5-10% performance boost in some configurations. I believe Antti is no longer working on this? Is there anyone else working in this area that might have data on how aggressive the PreloadScanner should be? Below I'll describe some of my tests. The PreloadScanner throttling happens in a couple of ways. First, the PreloadScanner only runs when we're blocked on JavaScript (see HTMLTokenizer.cpp). But further, as it discovers resources to be fetched, it may delay or reject loading the subresource at all due to throttling in loader.cpp and DocLoader.cpp. The throttling is very important, depending on the implementation of the HTTP networking stack, because throwing too many resources (or the low-priority ones) into the network stack could adversely affect HTTP load performance. This latter problem does not impact my Chromium tests, because the Chromium network stack does its own prioritization and throttling (not too dissimilar from the work done by loader.cpp). *Theory*: The theory I'm working under is that when the RTT of the network is sufficiently high, the *best* thing the browser can do is to discover resources as quickly as possible and pass them to the network layer so that we can get started with fetching. This is not speculative - these are resources which will be required to render the full page. The SPDY protocol is designed around this concept - allowing the browser to schedule all resources it needs to the network (rather than being throttled by connection limits). However, even with SPDY enabled, WebKit itself prevents resource requests from fully flowing to the network layer in 3 ways: a) loader.cpp orders requests and defers requests based on the state of the page load and a number of criteria. b) HTMLTokenizer.cpp only looks for resources further in the body when we're blocked on JS c) preload requests are treated specially (docloader.cpp); if they are discovered too early by the tokenizer, then they are either queued or discarded. *Test Case* Can aggressive preloadscanning (e.g. always preload scan before parsing an HTML Document) improve page load time? To test this, I'm calling the PreloadScanner basically as the first part of HTMLTokenizer::write(). I've then removed all throttling from loader.cpp and DocLoader.cpp. I've also instrumented the PreloadScanner to measure its effectiveness. *Benchmark Setup* Windows client (chromium). Simulated network with 4Mbps download, 1Mbps upload, 100ms RTT, 0% packet loss. I run through a set of 25 URLs, loading each 30 times; not recycling any connections and clearing the cache between each page. These are running over HTTP; there is no SPDY involved here. *Results:* *Baseline (without my changes)* *Unthrottled* *Notes* Average PLT 2377ms 2239ms +5.8% latency redux. Time spent in the PreloadScanner 1160ms 4540ms As expected, we spend about 4x more time in the PreloadScanner. In this test, we loaded 750 pages, so it is about 6ms per page. My machine is fast, though. Preload Scripts discovered 2621 9440 4x more scripts discovered Preload CSS discovered 348 1022 3x more CSS discovered Preload Images discovered 11952 39144 3x more images discovered Preload items throttled 9983 0 Preload Complete hits 3803 6950 This is the count of items which were completely preloaded before WebKit even tried to look them up in the cache. This is pure goodness. Preload Partial hits 1708 7230 These are partial hits, where the item had already started loading, but not finished, before WebKit tried to look them up. Preload Unreferenced 42 130 These are bad and the count should be zero. I'll try to find them and see if there isn't a fix - the PreloadScanner is just sometimes finding resources that are never used. It is likely due to clever JS