One idea that came up talking to Robert about the Services design was exposing an external map/reduce interface, mentioned under <https://dev.launchpad.net/ArchitectureGuide/ServicesRoadmap#A map/reduce facility>. It is pretty blue sky at the moment but I think it is such an interesting idea it would be worth writing down.
Launchpad does a lot of API traffic, which could be (citation needed) broadly grouped into many small users doing a few calls to eg file a bug, plus a number of clients that do huge amounts of bulk traffic. These bulk users are typically pulling data out of Launchpad to do offline bulk digestion, for instance to draw <http://people.canonical.com/~mbp/kanban/canonical-bazaar-kanban.html> or many other different Ubuntu reports. Typically they want something like "all bugs, with their tasks and mps, assigned to people in ~canonical-bazaar and either underway or finished in the last 30 days" or "all in progress Ubuntu bugs" or "the top 1000 ubuntu bugs by number of affected users." These tools take a long time to run, do thousands of api requests, and probably thereby put a fair load on Launchpad. 0- Some of these are things that could be done in Launchpad but are not yet, such as the Kanban view or the Ubuntu . It might be good to include some of them, such as a Kanban view of bugs or some of the Ubuntu QA reports, within Launchpad itself, and the efforts to make it easier to change Launchpad and easier to get spontaneously contributed changes landed help with that. But many have to a greater or lesser degree some user specific policy that might be hard to generalize; Launchpad doesn't necessarily want to get every useful featuer within the main ui; and many tools can be useful enough without being at the level of quality that would justify being widely available. 1- Another way to tackle this is to provide more aggregated APIs, like "give me all the bugs assigned and touched recently, with their tasks and mps" in one go. The REST API enhancement LEP <https://dev.launchpad.net/LEP/WebservicePerformance> goes towards this by offering a generic expand-out feature, though I think it would also be useful to have some APIs that just have hardcoded common sense that eg if you get a bug you want its tasks too. That would let the kanban software do O(1) call that gets a moderately large response to draw everything. 2- Another approach is to make it easier for the client to maintain an offline cache by emphasing "get me changes since date X" or "get me objects ordered by last change" (key cases like bugs already exist); and a client library that will make intelligent use of this abstracted from the application code. I think Arsenal does this. Getting better apis, and better handling of cached results, would let API clients do totally general work with probably something like 10x to 100x fewer API calls, correspondingly faster time, and nearly that much less Launchpad server load. 3- Robert pointed out that having every API user keep a replicated copy of parts of the Launchpad database is perhaps not the most elegant solution, compared to doing this work on the server. They could instead send a kind of map/reduce expression to the server and get back the results. So things like the kanban that want to say "give me everything assigned to mbp or jam or ... and either inprogress or (fixreleased and fixed in the last 30 days" could make a (say) javascript expression of such and get back just the actually relevant bugs, rather than fetching a lot more stuff and filtering client side. Tools that want to count or summarize bugs in various states can obviously reduce it on the server side too. One issue in doing this would be designing/choosing an expression language and deploying it. Perhaps a larger issue is that some of these jobs may take a long time; perhaps longer than is realistic for a single web request; certainly longer than is permitted in a single call at the moment. Badly designed calls might create a lot of load. So possibly this should be done out of a separate data warehouse, which would also be a chance to move it into a form that is more suited to mapreduce queries. This does seem kind of long a long path, so I wonder if there are mapreducey things that can be done within the existing rest synchronous real-data setup. Martin _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp