I created an asana task for me to look into the health checks. On Thu, Dec 12, 2013 at 1:02 PM, Evan Dandrea <[email protected]> wrote: > On 12 December 2013 14:46, Paul Larson <[email protected]> wrote: >> On Thu, Dec 12, 2013 at 4:01 AM, Evan Dandrea >> <[email protected]> wrote: >>> - Siva mentioned that the expected device wasn't appearing in `adb >>> devices`. Can we have a nagios check for this so we know sooner? >> As for detection, we should investigate if there's a good way to do >> this in nagios or in the jobs themselves. I think it sounds feasible >> but bad device detection may be better integrated into the jobs rather >> than relying on an external service that doesn't know what state >> things are expected to be in. > > Agreed. Nagios is really just a means of alerting based on some > condition. The jobs could handle identifying when something has gone > awry, tell Jenkins to hold the line, and drop a hint to nagios (a file > in an expected location). > >> Another thing that should help is the >> megajob refactor that Andy has been working on. It would at least >> deal better with a situation where we lose a device and not require >> regenerating all the jobs to get things moving again. After this goes >> in, I'd like to see about adding some sort of a health check that >> figures out if the device is at least reachable, and marks it >> bad/offline if not. Before that though, we need all the bits in place >> to detect the image on it and reflash if not. > > Paul, are you happy to take a task for the health check, pending the > refactor? Can you have it drop a file to hint to nagios that a phone > is dead (removing that file when things are clear)? > > Where do we stand on the megajob refactoring, Andy? > >> I think there are some things we could do to improve this (see above) >> and continue to look for new ways to make it as reliable as the >> devices will allow us to make it. > > Thanks Paul!
-- Mailing list: https://launchpad.net/~canonical-ci-engineering Post to : [email protected] Unsubscribe : https://launchpad.net/~canonical-ci-engineering More help : https://help.launchpad.net/ListHelp

