Nearly a year ago now when I started working on Launchpad (again :P) we faced a huge performance problem. We're over half way there now: our request backstop is set to 9 seconds (with 3 overrides). This is down from 20 seconds. We have approximately the same number of requests failing a day - well under a tenth of a percent across the site.
This is pretty damn awesome! We've found and corrected a huge number of inefficient pages which simply did too many queries, and others which had mistakes in their SQL queries. We've also improved a number of query schemas. Needless to say doing all this work has involved changes in some of our toolchain (such as allowing model level caching). And a month or so back when writing up the changes to our infrastructure that we did to address the infrastructure issues driving some aspects of our poor performance, I noted that we've cross a significant perceptual threshold: we're no longer primarily perceived as slow. This gives us the breathing room to look at the next major performance issue: our development cycle. A few things feed into this: - Its getting harder to fix performance bugs simply: accessing 60K rows of cold data @ 2ms each is always going to be a 2 minute operation. We need more sophisticated solutions to handle the scale of some of our problems. Adding such solutions is tricky and often requires multiple iterations, but we can only iterate once a month due to downtime constraints. - We have a code base where we routinely make changes with unexpected side effects, which hampers development. Sometimes they escape and become regressions (we added about a week of work in this way over the last 5 months). - Running enough tests to be confident that the whole test suite will pass is really quite hard. - Making reusable components is very tricky because of the tight coupling between our domain model and object persistence Many of these things have been discussed before. I have a proposal which I would like your joint help critically assessing. It is by *no means* a done deal nor finalised. The proposal is the first of three documents I intend us to have on this (large) topic: * The analysis / overview / business case * A vision stripping that analysis to its bare bones, establishes a framework for answering questions like 'should X be a service' and makes considered but opinionated choices about technology. * A migration roadmap which identifies ordering, costs and benefits from the various things that go into a multigeneration massive migration. In this proposal I have deliberately not made choices (such as 'rabbit vs xmlrpc vs restful json vs ...) which do not affect the overall discussion. I'm positive we'll have a fine old time deciding on different implementation choices; we should decide on the overall approach before making such choices though :) [what should we do, when should we do it and how should we do it... in that order when possible ] I've spoken to some of you already about this - thank you -very- much for your feedback on the proposal so far. I owe you all! The list at the top of the document is probably not complete - some of the ideas have been around (literally) for years. With no further ado: https://dev.launchpad.net/ArchitectureGuide/ServicesAnalysis Please read this and do one of: - comment in it - reply to this thread - reply to me privately depending your personal preferences. If the proposal survives this feedback process then I'll start digging into the juicy stuff - the other two documents I mention above. -Rob _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp