On Thu, Dec 10, 2009 at 9:50 AM, Owen Taylor <[email protected]> wrote: > Got curious about performance on live.gnome.org; the observed macro > pattern for system performance was: > > - Load is very spiky, sometimes low, sometimes quite high > - When it's high, there are httpd processes running at high > CPU utilization or spending most of their time in syscalls. > - Bottleneck seems to be CPU rather than disk - disk utilization > is quite low and is principally writes from httpd logging. > > Stracing the high-cpu and high-disk-wait httpd processes indicated that > they were doing "strange things" - e.g., stat'ing through the page > heirarchy looking for attachments for every page in the Wiki, so I > wanted to know what requests they were processing. > > To try and figure this out, I temporarily modified the httpd > configuration to include the processing time for each page in the log > files and grep'ed out long-running page requests for half an hour of > usage. > > There were 73 requests that took more than 10 seconds to process. > > 16 requests for /TitleIndex, min=20s, max=190s > > 18 requests for /WordIndex, min=35s, max=234s > > 25 requests for attachments, min=11s, max=38s > Most, though not all of these of these were for large images, > 500k or more > > 6 misc. POSTs (newaccount, login, edit, AttachFile), min=13s, max=66s > > 4 requests for wiki pages: min=16s, max=120s > > 2 requests for Category pages: min=12s, max=40s > > 1 request for AdvancedSearch: 11s > > I think it's fair to assume that the long-times for attachments and in > some cases for random pages are due to network issues - clients getting > data slowly and tying up an httpd issue. So, the thing that really > stands out here are the /TitleIndex and /WordIndex requests - why are we > getting all these requests for these expensive pages that aren't > obviously linked to. > > So, let's look at the first three requests for /WordIndex: > > IP: 195.27.20.2 > Time: 10/Dec/2009:16:01:25 +0000 > Request: GET /WordIndex?action=print HTTP/1.0" > Bytes: 1865069 > Referrer: "-" > User agent: "Mozilla/4.0 (compatible;)" > Time: 168.302989 > > IP: 195.27.20.2 > Time: 10/Dec/2009:16:01:25 +0000 > Request: GET /WordIndex HTTP/1.0 > Bytes: 1867640 > Referrer: "http://live.gnome.org/Tomboy/PluginList" > User agent: "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; > SV1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; > .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; MS-RTC LM 8; > .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)" > Time: 179.532250 > > IP: 93.174.145.75 > Time: 10/Dec/2009:16:03:39 +0000 > Request: GET /WordIndex HTTP/1.1" > Bytes: 1867640 > User agent: "Mozilla/4.0 (compatible;)" > Time: 52.201152 > > IP: 192.196.142.21 > Time: 10/Dec/2009:16:03:36 +0000 > Request: GET /WordIndex HTTP/1.1 > Bytes: 1867640 > User agent: "Mozilla/4.0 (compatible;)" > Time: 62.462147 > > So, the thing that stands out here is the consistent User Agent for > three out of the four, and the fact that the fourth request, while with > a different agent comes from the same IP at the same time as the first. > > If you do a web search, you'll find that this user agent is attributed > to being used by "Blue Coat" proxy server products which apparently do > speculative prefetching based on page contents. > > What page contents are they prefetching on? - if you look at the source > of one of the wiki pages - we see (e.g., for /GnomeShell) > > <link rel="Start" href="/Home"> > <link rel="Alternate" title="Wiki Markup" href="/GnomeShell?action=raw"> > <link rel="Alternate" media="print" title="Print View" > href="/GnomeShell?action=print"> > <link rel="Search" href="/FindPage"> > <link rel="Index" href="/TitleIndex"> > <link rel="Glossary" href="/WordIndex"> > <link rel="Help" href="/HelpOnFormatting"> > > There are in fact *no* obvious links to /TitleIndex and /WordIndex or to the > printable versions of pages anywhere in the page, and I'm not aware of any > current > browsers that present these links content in the user interface. So to > summarize: > > Our performance on live.gnome.org is being killed by speculative > prefetching on URLs that are added because they seemed like a good > idea but have no actual purpose on the page. > > Possible fixes: > > - Block /TitleIndex and /WordIndex entirely - they aren't useful pages > - Block the Blue Coat fetches by User Agent (this, however, apparently > doesn't get all the prefetches, sometimes it uses the user agent > of the requesting client.) > - Use apache's mod_cache facilities to cache /TitleIndex, /WordIndex > - Patch Moin to omit this section of the pages > > Don't have a lot of opinion which one of these or combination of these > is best - the last one makes some sense to me. > > - Owen
Sorry Owen I forgot to reply all the first time. The last one makes a lot of sense however it will require updating the patch as we upgrade moinmoin. What are the downsides of just blocking both of those URLS with a shiney gnome 403 page? Besides it being nifty to see those pages, is there any value add in keeping them? -- Jeff Schroeder Don't drink and derive, alcohol and analysis don't mix. http://www.digitalprognosis.com _______________________________________________ gnome-infrastructure mailing list [email protected] http://mail.gnome.org/mailman/listinfo/gnome-infrastructure
