Hi, I'm the one responsible for this blunder and apologize for it. (And thanks to William for fixing this while I was sleeping).
1) When you get an OOPS on staging do a thorough analysis. That means looking at _all_ the OOPS you get, and ensuring that the problem is a known problem, and that nothing weird related to your changes show up. In my case, I only look at one of the last OOPS I got which showed no problem apart from known recalculateBugHeat issue: OOPS-1998QASTAGING104) But that was the OOPS related to my 3rd attempt. The first one, OOPS-1998QASTAGING102 (which I didn't investigate) showed the problem with a cold cache. The new query took 9s in there. (But was very fast <63ms on the second and third attempts). 2) When "tuning" queries, please leave in comments in the code! There was not comment here and thought naively that I should get rid of the extra query to get the archive ids and use a join instead. Bad bad idea it seemed. A comment explaining this non-intuitive query would have saved me re-learning that already learned lesson :-) On 11-06-21 10:22 PM, Robert Collins wrote: > We are currently dealing with bug 800485 where validation of > sourcepackagenames has gone from 80ms to 1800ms(hot) or minutes > (cold). > > This was caused when a patch changed a non-storm query to a storm > query *and* added a single join table in (rather than the substituted > archive ids). > > Most of our queries are now tuned; postgresql consistently chooses bad > plans on the 'obvious' way to write things for many of our very large, > or very skewed data sets. > > As a result, whenever you change a query on a big table - where big > means > 20K rows - its important to try and exercise it on qastaging. > > If the thing you are testing times out, its *vital* that the timeout > be positively identified as a pre-existing condition before assuming > qastaging is slow[1]. > > In this particular case, the patch was qa'd, but an existing timeout > bug was assumed to be the cause of qa timeouts: we should have grabbed > the oops and positively id'd the timeout as the existing bug - that > would have told us about the regression and let us avoid the crisis. > > 1) how slow is qastaging? Its not, not really. It has enough memory on > the DB server to page into hot cache the working set for any one page > in the system: you may need to try a lot of times to seed the cache, > but *everything* *can* work on qastaging. > > > -Rob > > _______________________________________________ > Mailing list: https://launchpad.net/~launchpad-dev > Post to : launchpad-dev@lists.launchpad.net > Unsubscribe : https://launchpad.net/~launchpad-dev > More help : https://help.launchpad.net/ListHelp -- Francis J. Lacoste francis.laco...@canonical.com
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp