https://dev.launchpad.net/LEP/PersistenceLayer sketches out the top level constraints for the persistence layer project.
I wanted get some thoughts out about more technical aspects. Firstly, by adding a new layer we're essentially partitioning our code; so what should go where? A starting point to answer this is design principles. One major principle I have is that on-demand loading is actively harmful in high performance software: while its not as convenience for adhoc scripts, its very hard to reliably avoid poor performance due to object traversal triggering expensive (e.g. 3-4ms) queries thousands of times in a single web query. This means we need to be able to get everything needed to satisfy zope security checks and so forth as part of the initial lookup description. To avoid repeating ourselves in persistence layer using code its probably best to have any additional lookups (e.g. 'is member of admins') happen as part of the persistence layer. (Which may itself be two or three layers deep). Actual query code should go in/under the persistence layer. I imagine we'll have some general code and some code specific to the backend stores that we have (which today is the three pg stores - session, launchpad, launchpad_slave). I include in 'actual query code' collection size estimates. It would be nice to enable systematic use of size estimates in this layer, though its not a deliberate scoped task. Code that *requests* a partial object graph should become a consumer of the persistence layer. Code that works on objects must live above the persistence layer. For instance, code that sends mail, chooses what to render in a template - above the layer. This code can assume that objects returned from the persistence layer are all disclosable, and that all relevant objects are already in memory. How then, shall we describe what objects and what operations we want from the persistence layer? The foundations folk are working on a similar problem (but simplified - no transactions) - in the webservice layer. https://dev.launchpad.net/LEP/WebservicePerformance has their draft efforts. Riffing on that work I spent some time exploring the space of prepping a query language in Python, for python objects. It seems to me that having a mutable query object which is itself a graph, and the graph represents the objects level graph nodes and edges to retrieve offers great flexability and clear code. For instance: Assume that request startup creates a transaction object for us, and we have a nominal root node that represents the system as whole, we could write some code like this to implement Person:+commentedbugs The view is constructed with the Person object so.. >>> query = launchpad.people.query() >>> query <'Person' search filter=None> >>> query.filter = Id(self.context.id) >>> query <'Person' search filter=Id(2)> The bugs relation is all possible related bugs - a huge set. We want comments by the person we're looking up >>> query.bugs.filter = CommentBy(self.context) We could write that in a more generic fashion - referencing some collection the query itself includes. >>> query.bugs.filter = CommentBy(query) We want to slice the returned bugs >>> query.bugs.slice(0,50) And we want the total as well - we tell the layer we want that up front so that when it can be optimised, it is. >>> query.bugs.stats = Count() And now transform the query into a result >>> result = query.execute() >>> len(result) 1 >>> len(result.bugs) 50 >>> result.bugs.stats {'count': 254} Relations that are not traversed are not queried; we can select down to individual attributes in a similar fashion to the .filter attribute - using a .get or .retrieve attribute. What do you think? Does this sound nice to use? -Rob _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

