Hi Robert, This sounds like an ambitious project that could positively affect our overall run-time performance.
You said that our ActiveRecord pattern is at the source of our API (I took it as REST API) slowness. I wonder how this new architecture would map externally. Leonard, maybe you have an idea? Cheers -- Francis J. Lacoste francis.laco...@canonical.com On October 31, 2010, Robert Collins wrote: > I've a new structure for our code that I'd like to propose. There's a > bunch of things that intersect here, and I'm *massively* tired from an > excellent, but intense UDS + travel... so apologies in advance for any > incoherence in what follows. > > We face many problems in our code with optimising pages. Some specific ones > are: - impedance mismatch between sql-optimised helpers and code on a > 'model' object - small changes can cause death-by-queries when a > previously untuned object is added to a template and triggers per-row > queries. > > I and others have been talking round some ideas about this for a > while. One idea is a High Level Query Language and introducing an > explicit mapping layer. Another is pushing much more eager loading > facilities into and through our ORM - Storm. > > Both of those things are good things to do but are either hard, > incomplete or potentially high risk. That doesn't mean we shouldn't > work on them! > > However, I think a more important and fundamental aspect of the > problem is that the pattern we use - and which many ORMs use, is > inclined *by design* to grow complexity and trigger DB queries at just > the wrong time. That pattern is the "Active Record" pattern, if you > have the PoEAA book (my copy is in a box...somewhere). Basically it > says that a single object represents a single row derived from the > database, and that changes to it are pushed to the database, and > attribute access on it traverses the object graph from the database. > > This pattern affects us in several ways: > - its the root cause of our death-by-queries behaviour... > particularly the way that previously tuned pages regress. > - its the root cause for many performance problems in our APIs > > So I have an idea that I think shows some promise. Its at least > superficially similar to the 'RecordSet' pattern (PoEAA again), but > I'm going to define it entirely here to avoid confusion, because > RecordSet may actually be different ( I can't look at the full details > until I find my PoEAA book), and finally because RecordSet interacts > with other patterns that we're not really in a place to implement > without almost-certain significant disruption deep in our stack. Its > also related to the existing Colllection idioms, but addresses > multiple types and also our default behaviours. > > Lets call this a Group based approach for now (because 'Sets' have an > over-used history in Launchpad). > > Firstly, an example of what code using this might look like (typically > part of this would happen in url routing). The plurals are deliberate > > :) The 'context' object could be obtained via a thread-local or > > participation lookup, I've included it here for clarity. > > context = Context() > distros = Launchpad.search(context, distro_name='ubuntu') > branding = distros.get_branding() > bugs = distros.hot_bugs(limit=20) > people = bugs.get_subscribers() > people.update(bugs.get_assignees()) > people.update(bugs.get_reporters()) > branding.update(people.get_branding()) > context.realise() > distro = None > for bug in bugs: > if bug.distro != distro: > distro = bug.distro: > show_branding(distro) > ... > > > Now, let me annotate this with intent: > # Setup a context for doing a bunch of related things. > context = Context() > # Starting with our root object, get all distros matching the literal > ubuntu (so the set will be 1) > # - this doesn't necessarily query at this point. It /may/ but it is > not required to. > distros = Launchpad.search(context, distro_name='ubuntu') > # We're going to want to be able to traverse through to the branding > for whatever distro(s) we got back, so ask for that. > # Like before, queries are not necessary at this point. > branding = distros.get_branding() > # And so on through some more related things: > # bugs > bugs = distros.hot_bugs(limit=20) > # and their people > people = bugs.get_subscribers() > # We want people from multiple sources - this could do an OR or a > UNION when it actually executes. > people.update(bugs.get_assignees()) > people.update(bugs.get_reporters()) > # We want the branding for all the people we found. > branding.update(people.get_branding()) > # We're going to start accessing the individual objects now, > # so for clarity, trigger all the DB queries needed to satisfy > # what we've asked for. > context.realise() > # And the rest of the code is guaranteed (*) not to do DB queries > (*) For stuff we've converted. See under 'migration/interoperation'. > > > So the basic idea is to extend the idiom of working on sets of data - > groups - to the very top of our code stack. Rather than > Person.is_valid_team, we'd have Persons.are_valid_teams() or > Persons.filter(valid_team=True). > > I've thought up some possible conventions around this, but I'm sure > there is a lot missing. > Error reporting: > - just raise an exception. If we need to dispatch differently, we can > use an exception that wraps individual exceptions for per-row > granularity. Note that this will often wait until realise() to be > raised at all. > > Evaluation: > - lazy as we build out the things we need, then eager on realise(). > We could add a trigger for realize to __iter__ on groups but I think > it would reduce safety. > - realise() could potentially perform multiple queries per group - or > could use DecoratedResultSet (or similar) to perform a single query. A > primary point of this interface is to let folk working on performance > change from single query to multiple query *without changing the > callers*. > > Interoperation/Migration: > - For mutating queries, we need to return existing classes (because > I'm not proposing a mutating interface [yet]), so our existing Table > based classes stay, just as they are. > - we're read heavy: we can start moving methods off of the Table > classes and onto Groups incrementally. Where a mutating operation > needs a method, a transient Group can be created containing just > [self] to permit forwarding the method but not duplicating the code. > - things which have no mutating operation can be converted to > non-Table class - to Plain Old Python Objects, which can be unit > tested to our hearts content. > - the machinery needed to bootstrap should be minimal, so if we as a > team like the sound of this, we could sensibly say 'all new code > should be like this' and start a mass migration. > - we'd want to convert individual attributes at a time I think, in a > process like: > - add a Groups method to specify needing the attribute > - delegate to the Groups method from the current property/function > - migrate callers one at a time > - callers which are using Groups and don't specify needing a > converted attribute could hit a cross-check which would blow up. This > is perhaps the wrong approach and ugly though :). > - once all callers are migrated remove the delegation-to-Groups so > that we can't ever be bitten on that attribute again. Ever. > > Changes to how we do things: > - methods on 'domain objects' would never perform DB access. They > must be only using non-lazy attributes - the builder interface would > populate appropriate attributes (e.g., the columns we use (not all), > and the related data we obtained (such as self.branding)) > - we'd need a variant of cachedproperty which can answer from cache > but cannot populate it. > - we could easily calculate the canonical url of objects as we get > them from the DB (again, Readonly requests), avoiding non-trivial > O(N^2) lookup behaviour. > - We'd probably one one (or more) Group types per db table, > eventually. These Groups would be very similar to ResultSets except: > - they are not exposing a SQL interface. They are /typed/. > - they are specialised for the things being returned > - where multiple types may be returned, adaption would be a good way > to talk about one fraction of a group. E.g. search can return > people/projects etc - we could cast a search result to IPersons to > talk about things only relevant to Persons. > - Groups become responsible for injecting attributes we need into the > individual represents-a-row objects > > This can also be looked at as a combination of HQL/query builder - but > its really much less capable than I'd expect an arbitrary HQL to be. > > So - what do you guys think? Does it have legs? > > -Rob > > _______________________________________________ > Mailing list: https://launchpad.net/~launchpad-dev > Post to : launchpad-dev@lists.launchpad.net > Unsubscribe : https://launchpad.net/~launchpad-dev > More help : https://help.launchpad.net/ListHelp
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp