Image #60 was not successfully produced. This obviously created problems for the landing team. Let's take a few minutes to brainstorm on what went wrong and come up with some better policy to prevent this from happening again.
- Why wasn't an email sent out to the teams affected by the moving of phones to a new host at least a day or two in advance of the move? We have done this before with scheduled DC maintenance and with the 1SS move. We should be doing it every time we make a change that affects other teams. - I realise they were unrelated events, but was there any conceivable way we could've caught the device failure that followed? That is, could we have kicked off some test runs, or aligned the move to the image production time? The answer here may be no, but I want to at least discuss why. - Siva mentioned that the expected device wasn't appearing in `adb devices`. Can we have a nagios check for this so we know sooner? - Is there anything else you think we could've done to better manage this? Short of moving to the Airline, are there things you think we could be doing to make us more resilient to this kind of failure? Thanks! Context: 8:42 AM <didrocks> cihelp: is it me or the ci dashboard has some issues? (can be the backend) 8:42 AM <didrocks> no image 60 results 8:42 AM <didrocks> image 61 run all for mako, but stopped on maguro 8:42 AM <didrocks> image 62 should start soon I guess 8:55 AM <psivaa> didrocks: the touch devices were being moved to a new host last night.. 8:55 AM <didrocks> psivaa: hum, did I miss an email? 8:56 AM <psivaa> didrocks: no, 8:56 AM <psivaa> <plars> psivaa: so if didrocks is wondering in the morning what happened to image 60, it was a victim of moving those devices to a new host :( 8:56 AM <didrocks> would better to get an email for it :/ 8:56 AM <didrocks> ev: can we establish some procedure for this? ^ 8:56 AM <didrocks> psivaa: so, the new image is running tests, now? 8:57 AM <psivaa> didrocks: but according to plars the image 61 should be going along well.. 8:57 AM <psivaa> let me check please 8:57 AM <didrocks> psivaa: 61 doesn't have maguro tests 8:57 AM <didrocks> well, didn't finish them 8:58 AM <psivaa> didrocks: yea the device disappeared during camera app tests. let me see if i can find it in the host 9:05 AM <ev> didrocks: we're supposed to already be doing that. Larry sends out an email with each hardware move, but I guess the phones have been considered something of a grey area. I'll make sure the team knows that we need to be sending out warnings with any kind of change that would affect running services, including any hardware changes. 9:06 AM <didrocks> ev: yeah, and if it's a planned change as well, some days in advance can help :) 9:06 AM <didrocks> especially as I was really interesting in the results from run #60, and we'll never have it :/ 9:09 AM <psivaa> didrocks: the particular maguro is not showing up on the host either.. something must have happened. we need someone to take a look in person. I'll run the tests with another device for now 9:09 AM <didrocks> psivaa: ok, thanks ;) -- Mailing list: https://launchpad.net/~canonical-ci-engineering Post to : [email protected] Unsubscribe : https://launchpad.net/~canonical-ci-engineering More help : https://help.launchpad.net/ListHelp

