I've a new structure for our code that I'd like to propose. There's a bunch of things that intersect here, and I'm *massively* tired from an excellent, but intense UDS + travel... so apologies in advance for any incoherence in what follows.
We face many problems in our code with optimising pages. Some specific ones are: - impedance mismatch between sql-optimised helpers and code on a 'model' object - small changes can cause death-by-queries when a previously untuned object is added to a template and triggers per-row queries. I and others have been talking round some ideas about this for a while. One idea is a High Level Query Language and introducing an explicit mapping layer. Another is pushing much more eager loading facilities into and through our ORM - Storm. Both of those things are good things to do but are either hard, incomplete or potentially high risk. That doesn't mean we shouldn't work on them! However, I think a more important and fundamental aspect of the problem is that the pattern we use - and which many ORMs use, is inclined *by design* to grow complexity and trigger DB queries at just the wrong time. That pattern is the "Active Record" pattern, if you have the PoEAA book (my copy is in a box...somewhere). Basically it says that a single object represents a single row derived from the database, and that changes to it are pushed to the database, and attribute access on it traverses the object graph from the database. This pattern affects us in several ways: - its the root cause of our death-by-queries behaviour... particularly the way that previously tuned pages regress. - its the root cause for many performance problems in our APIs So I have an idea that I think shows some promise. Its at least superficially similar to the 'RecordSet' pattern (PoEAA again), but I'm going to define it entirely here to avoid confusion, because RecordSet may actually be different ( I can't look at the full details until I find my PoEAA book), and finally because RecordSet interacts with other patterns that we're not really in a place to implement without almost-certain significant disruption deep in our stack. Its also related to the existing Colllection idioms, but addresses multiple types and also our default behaviours. Lets call this a Group based approach for now (because 'Sets' have an over-used history in Launchpad). Firstly, an example of what code using this might look like (typically part of this would happen in url routing). The plurals are deliberate :) The 'context' object could be obtained via a thread-local or participation lookup, I've included it here for clarity. context = Context() distros = Launchpad.search(context, distro_name='ubuntu') branding = distros.get_branding() bugs = distros.hot_bugs(limit=20) people = bugs.get_subscribers() people.update(bugs.get_assignees()) people.update(bugs.get_reporters()) branding.update(people.get_branding()) context.realise() distro = None for bug in bugs: if bug.distro != distro: distro = bug.distro: show_branding(distro) ... Now, let me annotate this with intent: # Setup a context for doing a bunch of related things. context = Context() # Starting with our root object, get all distros matching the literal ubuntu (so the set will be 1) # - this doesn't necessarily query at this point. It /may/ but it is not required to. distros = Launchpad.search(context, distro_name='ubuntu') # We're going to want to be able to traverse through to the branding for whatever distro(s) we got back, so ask for that. # Like before, queries are not necessary at this point. branding = distros.get_branding() # And so on through some more related things: # bugs bugs = distros.hot_bugs(limit=20) # and their people people = bugs.get_subscribers() # We want people from multiple sources - this could do an OR or a UNION when it actually executes. people.update(bugs.get_assignees()) people.update(bugs.get_reporters()) # We want the branding for all the people we found. branding.update(people.get_branding()) # We're going to start accessing the individual objects now, # so for clarity, trigger all the DB queries needed to satisfy # what we've asked for. context.realise() # And the rest of the code is guaranteed (*) not to do DB queries (*) For stuff we've converted. See under 'migration/interoperation'. So the basic idea is to extend the idiom of working on sets of data - groups - to the very top of our code stack. Rather than Person.is_valid_team, we'd have Persons.are_valid_teams() or Persons.filter(valid_team=True). I've thought up some possible conventions around this, but I'm sure there is a lot missing. Error reporting: - just raise an exception. If we need to dispatch differently, we can use an exception that wraps individual exceptions for per-row granularity. Note that this will often wait until realise() to be raised at all. Evaluation: - lazy as we build out the things we need, then eager on realise(). We could add a trigger for realize to __iter__ on groups but I think it would reduce safety. - realise() could potentially perform multiple queries per group - or could use DecoratedResultSet (or similar) to perform a single query. A primary point of this interface is to let folk working on performance change from single query to multiple query *without changing the callers*. Interoperation/Migration: - For mutating queries, we need to return existing classes (because I'm not proposing a mutating interface [yet]), so our existing Table based classes stay, just as they are. - we're read heavy: we can start moving methods off of the Table classes and onto Groups incrementally. Where a mutating operation needs a method, a transient Group can be created containing just [self] to permit forwarding the method but not duplicating the code. - things which have no mutating operation can be converted to non-Table class - to Plain Old Python Objects, which can be unit tested to our hearts content. - the machinery needed to bootstrap should be minimal, so if we as a team like the sound of this, we could sensibly say 'all new code should be like this' and start a mass migration. - we'd want to convert individual attributes at a time I think, in a process like: - add a Groups method to specify needing the attribute - delegate to the Groups method from the current property/function - migrate callers one at a time - callers which are using Groups and don't specify needing a converted attribute could hit a cross-check which would blow up. This is perhaps the wrong approach and ugly though :). - once all callers are migrated remove the delegation-to-Groups so that we can't ever be bitten on that attribute again. Ever. Changes to how we do things: - methods on 'domain objects' would never perform DB access. They must be only using non-lazy attributes - the builder interface would populate appropriate attributes (e.g., the columns we use (not all), and the related data we obtained (such as self.branding)) - we'd need a variant of cachedproperty which can answer from cache but cannot populate it. - we could easily calculate the canonical url of objects as we get them from the DB (again, Readonly requests), avoiding non-trivial O(N^2) lookup behaviour. - We'd probably one one (or more) Group types per db table, eventually. These Groups would be very similar to ResultSets except: - they are not exposing a SQL interface. They are /typed/. - they are specialised for the things being returned - where multiple types may be returned, adaption would be a good way to talk about one fraction of a group. E.g. search can return people/projects etc - we could cast a search result to IPersons to talk about things only relevant to Persons. - Groups become responsible for injecting attributes we need into the individual represents-a-row objects This can also be looked at as a combination of HQL/query builder - but its really much less capable than I'd expect an arbitrary HQL to be. So - what do you guys think? Does it have legs? -Rob _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp