On 29.01.2010 10:54, Jonathan Lange wrote: > Sorry, I had meant to reply to all. > > On Fri, Jan 29, 2010 at 9:53 AM, Jonathan Lange <[email protected]> wrote: >> On Thu, Jan 28, 2010 at 7:30 PM, Abel Deuring >> <[email protected]> wrote: >>> Hi Stuart, hi Jono, >>> >> Hello Abel, >> >>> Tom and myself started to work on sorting bugs/bugtasks by the age of >>> the youngest patch attachment (see >>> https://bugs.edge.launchpad.net/malone/+bug/506018 , >>> https://bugs.edge.launchpad.net/malone/+bug/512500 ). Basically, this >>> means to add an option "sort by creation time of youngest patch >>> attachment" to IHasBugs.searchTasks(). >>> >>> We are a bit concerned that a regulary query, where we sort by the >>> youngest patch via something like >>> >>> ORDER BY (SELECT max(message.datecreated) >>> FROM message, bugattachment >>> WHERE bugattachment.message=message.id AND >>> bugattachment.bug=<bug.id from main query> AND >>> bugattachment.type=1) >>> >>> in a query like "SELECT BugTask.whatever FROM BugTask,... WHERE...", >>> might easily lead to timeouts when this sorting is used for Ubuntu >>> bugs... (Tom wanted to write such a query for a test on staging, so we >>> might soon have some data.) >>> >> Have you got data yet?
yes. This is the query: https://pastebin.canonical.com/27199/ It is a query that is generated for an lplib call like ubuntu.searchTasks(has_patch=True) but with a modified ORDER BY expression, basically the above one. mthaddon ran an EXPLAIN ANALYSE for it on staging. The result: https://pastebin.canonical.com/27202/ . 9 minutes run time; as I understand it, most time is spent for sorting... But the query might not be the best one can think of... >> >>> So we also considered to add a column date_created_youngest_patch to the >>> table Bug, where this column would be updated only by triggers for >>> changes of the table BugAttachment, like so: >>> >> ... >>> In other words, a deliberate denormalisation. What do you think, should >>> we start with a plain query without such a denormalsation, or do you >>> think that it is better to add the "cache column" >>> date_created_youngest_patch to the Bug table? >>> >> This is really Stuart's province. I don't have a strong opinion on this. >> >> However, it seems to me that we're getting increasingly frequent >> requests to denormalize data in order to make querying faster. Perhaps >> we need a general answer, or some other technology. Perhaps, but finding a general answer is not trivial, I think. "Think harder to find a better query" is one answer, but won't give useful results in each case; "We don't really need such a query" is another (though Bryce had good reasons to ask for the sort order we're discussing); "Let's accept denormalisation after thoughtful consideration if we find no better way" seems most reasonable to me. Regarding other technology, I'm quite curious about suggestions. Abel _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

