Hi chromium-dev,
  A small group of us joined forces to create a "Green Tree" task force. The
goal of this task
force is to make sure the tree stays green most of the time.  The 2 main
pain points that
we are attacking at this time are "reducing the buildbot cycle time", to
catch errors earlier, and
"getting rid of the flakiness", to make sure the tree does not turn red for
no reason.

  I'll be prepending "[Green Tree]" to the emails I send related to the task
force.

  You can also follow the progress and our tasks there:
http://code.google.com/p/chromium/issues/list?q=label:GreenTreeTaskForce

For those interested, these are the highlights of the last week:

- Make sure all the tasks have bugs associated with them (pamg)
- Make sure VMWare Tools is installed on all the slaves (bev / nsylvain)
- Disable all services that we don't need on the slaves (bev)
- Split the windows chromium tests in 3 slaves (maruel)
- Change the gatekeeper to close the tree on more failures (maruel)
 - Change LKGR to care about more tests, and make it cycle faster (maruel)
- Write a status page to see the cycle speed on the slaves (nsylvain)
- Make sure we build only what we need on Mac (thomasvl)
- Add more try bots (linux views, valgrind) (maruel)
- Refactor Linux Valgrind buildbots into builder/testers. (mmoss)
- Create a dashboard to see the slowest tests (phajdan)
- Speed up the transfer of builds between builders/testers by reducing the
compression (mmoss)

  I'm sure I forgot some, feel free to append to this list.

  Despite our efforts, this was one of the worse week we've seen in a long
time in term of tree closure. This
was caused by 5 main events:

 - Buildbot maintenance went wrong. By changing a mounted drive on the
buildbot master, the mount table got corrupted, and we had to reboot the
main server. We started the maintenance at 7:30AM (pacific) and we got the
buildbot back online shortly after 10AM. It had to cycle a little, so it was
closed for almost 3 hours
 - A webkit merge left some failures in the tree. And it looks like everyone
left without fixing it, so it was closed overnight. We fixed it in the
morning, but before reopening we let another webkit merge go by, and it also
broke the tree, requiring a change on webkit.org to fix the reliability
tests (IIRC). Total closure time: 20 hours.
 - A bad gclient change got checked in. Some machines stopped running
"runhooks" and some bots got confused. The damage seems to have been
limited.
 - A second bad gclient change got checked in. This time causing all the
bots to throw away their checkouts. Almost each slaves had to do a full
checkout (which takes an hour or so), and some of them ran out of disk
space, so we had to manually fix them. The tree was closed for another
couple of hours.
 - A bad DEPS file got checked in. Causing again a bunch of slaves to throw
away their checkout. It was closed for another hour or two.

Nicolas

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

Reply via email to