http://lwn.net/Articles/317070/

Development

[LWN subscriber-only content]

Fully automated bisecting with "git bisect run"

February 3, 2009

This article was contributed by Christian Couder

It's a common developer practice to track down a bug by looking for the change that introduced it. This is most efficiently done by performing a binary search between the last known working commit and the first known broken commit in the commit history. git bisect is a feature of the Git version control system that helps developers do just that.

git bisect may also be well known by LWN readers for heated discussions on the Linux kernel mailing list about "asking" (or "forcing" depending on the point of view) users to find the bad commit when they report a regression. But a little-known addition, git bisect run, can allow a developer to completely automate the process. This can be very useful and may enable switching to interesting new debugging workflows.

At each step of the binary search, git bisect checks out the source code at the commit chosen by the search. The user then has to test to see if the software is working or not. If it is, the user performs a git bisect good, otherwise they do a git bisect bad, and the search proceeds accordingly. This is different than the idea behind git bisect run, as it uses a script or a shell command to determine if the source code―which git bisect automatically checked out―is "good" or "bad".

This idea was suggested by Bill Lear in March 2007, and I implemented it shortly thereafter. It was then released in Git 1.5.1.

Technically, the script or command passed to git bisect run is run at each step of the bisection process, and its exit code is interpreted as "good", if it's 0, or "bad", otherwise (except 125 and values greater than 127, see the git bisect documentation for more information.)

One simple and yet useful way to take advantage of that is to use git bisect run to find which commit broke the build. Some kernel developers like this very much. Ingo Molnar wrote:

for example git-bisect was godsent. I remember that years ago bisection of a bug was a very [laborious] task so that it was only used as a final, last-ditch approach for really nasty bugs. Today we can [autonomously] bisect build bugs via a simple shell command around "git-bisect run", without any human interaction!

For example, with a not too old Git (version 1.5.2 or greater), bisecting a build bug in the Linux kernel may be just a matter of launching:

    git bisect start linux-next/master v2.6.26-rc8
    git bisect run make kernel/fork.o

because the git bisect start command, when it is passed two (or more) revisions, here "linux-next/master" and "v2.6.26-rc8", interprets the first one as "bad" and the other ones as "good".

This works as follows: git bisect checks out the source code of a commit to be tested, then runs make kernel/fork.o. make will exit with code 0 if it builds, or with something else (usually 2) otherwise. This gets recorded as "good" or "bad" for the commit that was checked out, which will enable the binary search to continue by finding another commit to check out, then run make again, and so on, until the first "bad" commit in the history is found.

But to bisect regressions that manifest themselves on the running code, as opposed to build problems, it's usually more complicated. You probably have to write a test script that should be passed to git bisect run.

For example, a test script for an application built with make and printing on its standard output might look like this:

    #!/bin/sh

    make || exit 125   # an exit code of 125 asks "git bisect"
		       # to "skip" the current commit

    # run the application and check that it produces good output
    ./my_app arg1 arg2 | grep 'my good output'

See this message from Junio Hamano, the Git maintainer, for explanations and a real world example of git bisect run used to find a regression in Git. The git bisect documentation has some short examples too.

It's even trickier for kernel hackers, because you have to reboot the computer each time you want to test a new kernel, but some kernel hackers suggest that it be used anyway if the problem is "reproducible, scriptable, and you have a second box". Ingo Molnar describes his bisection environment this way:

i have a fully automated bootup-hang bisection script. It is based on "git-bisect run". I run the script, it builds and boots kernels fully automatically, and when the bootup fails (the script notices that via the serial log, which it continuously watches - or via a timeout, if the system does not come up within 10 minutes it's a "bad" kernel), the script raises my attention via a beep and i power cycle the test box. (yeah, i should make use of a managed power outlet to 100% automate it)

So it's possible to use git bisect run on a wide array of applications. This means that, for example, automatically in your nightly builds, you can find the commit that broke the build or the test suite, and then use information from it to send a flame warning email to the developer responsible for that.

But what may be more interesting is that fully automated bisection may enable new workflows. On the git mailing list, Andreas Ericsson, a Git developer, reported:

To me, I'd happily use any scm in the world, so long as it has git-bisect. Otoh, I'm a lazy bastard and love bisect so much that all our automated tests are focused around "git bisect run". This means bugs in software released to customers are few and far apart. When we get one reported, we just create a new test that exposes it, fire up git-bisect and then go to lunch. Quality costs, however. We pay that bill by using a workflow that's perhaps more convoluted than necessary.

So it requires a little more work to make sure that every commit is small and easily bisectable. Then, to debug regressions, they follow these steps:

  • write, in the test suite, a test script that exposes the regression
  • use git bisect run to find the commit that introduced it
  • fix the bug that is often made obvious by the previous step
  • commit both the fix and the test script (and if needed more tests)

This may seem more complicated than a traditional workflow. But when asked about it, Andreas says:

I guess the real benefit is that "git bisect" makes the tests so immensely valuable, and so easy to write, that we do it gladly and quickly. The value comes *now* from almost all test-cases instead of in some far-distant and obscure future.

So this kind of workflow is good to take advantage of test cases you write. But what about global productivity? Four months after having said that he uses git bisect run, Andreas Ericsson wrote that git bisect "is well-nigh single-handedly responsible for reducing our average bugreport-to-fix time from 4 days to 6 hours".

Now, after more than one year of using it, he gives the following details:

To give some hard figures, we used to have an average report-to-fix cycle of 142.6 hours (according to our somewhat weird bug-tracker which just measures wall-clock time). Since we moved to git, we've lowered that to 16.2 hours. Primarily because we can stay on top of the bugfixing now, and because everyone's jockeying to get to fix bugs (we're quite proud of how lazy we are to let git find the bugs for us). Each new release results in ~40% fewer bugs (almost certainly due to how we now feel about writing tests). That's a huge boost in code quality and productivity, and it earned me and my co-workers a rather nice bonus last year :)

So quality costs, but, when using the right tools and workflows, it can bring in a rather nice return on investment!


Reply via email to