Hi chromium-dev, A small group of us joined forces to create a "Green Tree" task force. The goal of this task force is to make sure the tree stays green most of the time. The 2 main pain points that we are attacking at this time are "reducing the buildbot cycle time", to catch errors earlier, and "getting rid of the flakiness", to make sure the tree does not turn red for no reason.
I'll be prepending "[Green Tree]" to the emails I send related to the task force. You can also follow the progress and our tasks there: http://code.google.com/p/chromium/issues/list?q=label:GreenTreeTaskForce For those interested, these are the highlights of the last week: - Make sure all the tasks have bugs associated with them (pamg) - Make sure VMWare Tools is installed on all the slaves (bev / nsylvain) - Disable all services that we don't need on the slaves (bev) - Split the windows chromium tests in 3 slaves (maruel) - Change the gatekeeper to close the tree on more failures (maruel) - Change LKGR to care about more tests, and make it cycle faster (maruel) - Write a status page to see the cycle speed on the slaves (nsylvain) - Make sure we build only what we need on Mac (thomasvl) - Add more try bots (linux views, valgrind) (maruel) - Refactor Linux Valgrind buildbots into builder/testers. (mmoss) - Create a dashboard to see the slowest tests (phajdan) - Speed up the transfer of builds between builders/testers by reducing the compression (mmoss) I'm sure I forgot some, feel free to append to this list. Despite our efforts, this was one of the worse week we've seen in a long time in term of tree closure. This was caused by 5 main events: - Buildbot maintenance went wrong. By changing a mounted drive on the buildbot master, the mount table got corrupted, and we had to reboot the main server. We started the maintenance at 7:30AM (pacific) and we got the buildbot back online shortly after 10AM. It had to cycle a little, so it was closed for almost 3 hours - A webkit merge left some failures in the tree. And it looks like everyone left without fixing it, so it was closed overnight. We fixed it in the morning, but before reopening we let another webkit merge go by, and it also broke the tree, requiring a change on webkit.org to fix the reliability tests (IIRC). Total closure time: 20 hours. - A bad gclient change got checked in. Some machines stopped running "runhooks" and some bots got confused. The damage seems to have been limited. - A second bad gclient change got checked in. This time causing all the bots to throw away their checkouts. Almost each slaves had to do a full checkout (which takes an hour or so), and some of them ran out of disk space, so we had to manually fix them. The tree was closed for another couple of hours. - A bad DEPS file got checked in. Causing again a bunch of slaves to throw away their checkout. It was closed for another hour or two. Nicolas --~--~---------~--~----~------------~-------~--~----~ Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~----------~----~----~----~------~----~------~--~---