Chris,
I agree with you in most points, in particular that
- MTBF is the most important thing, therefore
- topcrashers take priority over `ordinary' crashers
- "zero known topcrashers" is a reasonable strategy (as long as there are outstanding
topcrashers)
- this strategy has been successfully applied to past releases
However, I think it would be wrong to focus on topcrashers only. The main reason for
this is that I've seen many topcrashers being duped onto existing reports. Existing
bug reports provide steps to reproduce, and if these steps are missing it's very hard
to get topcrashers fixed. Therefore existing crasher reports can be very helpful for
the goal of increasing MTBF, but _only_ if the connection between a topcrash and an
existing bug report can be found.
The more open crasher bugs, the harder it becomes to spot the one that may be your
topcrash. Lots of open bugs means everyone has to spend much time on testing and
duping crash reports. In the past, the number of open crasher bugs has been growing
slowly, but steadily. We should try to stop this trend.
The goal should be to reduce the number of crasher bugs. This will only work if the
target is zarro boogs. (Whether this goal will be reached for mozilla 0.9, mozilla 1.0
or mozilla 1.1, or even later is not the most important question here.) To be really
useful, the complete list of "known crashers" must be shorter than the current
most-frequent-bugs list; ideally it would be less than a dozen. (You'll hardly ever
get it down to zero because fixing newly reported crashers takes time.)
So is it at all possible? I believe it is, and my guess is that it wouldn't take much
additional developers' time because much work is done by external QA volunteers
anyway. But we need a better framework:
- In Bugzilla, we need a standardized way to mark bugs that have "crash" in the
summary as non-crashers. I have seen several bugs where the crash keyword has been
added repeatedly, and every single time the bug owner has removed the keyword because
it didn't apply.
- In Bugzilla, we need a standardized way to mark "obscure" crashers. In XP cases you
would probably mark them as WORKSFORME, but crashers are often more platform sensitive
(occurs only on XYZ linux with XFree 4.x.y and glibc-whatever patch), hard to
reproduce, or intermittent that this resolution would be inappropriate
- In Bugzilla, we need a standardized way to mark "known" crashers ("fixme" :). This
would apply to bugs that are well-understood by QA people (simple testcase exists),
where the stack trace is known, where a search for duplicates has been made, and where
you really have to know the code to make more progress. If you want to make the life
for volunteer QA really hard, you can require a "full" stack trace from a debug build
as a prerequisite for this state (this would ensure that you know the exact line
number where the crash happens).
[These "known" crashers would be the interesting ones for developers. Obviously, there
would be less than 413 of them.]
- In Bugzilla, we need a standardized place for the function name where the crash
occurs. There already exists a notation [@ nsFoo::Bar] for topcrashers, so we have to
decide if this notation should also be used for the rest. If this is agreed upon, it
should be applied to every crasher bug where a stack trace is known. If there are
multiple stack traces in one bug, then either the bug has to be split, or all of the
possible crash locations are put there.
Furthermore, I would like to see an official policy that existing crasher fixes ought
to be checked in without delay, usually within a week. There have been cases where a
known fix has not been applied for weeks, or even more than a month, apparently only
because there is no milestone. This is especially bad in case of topcrash regressions
where you get lots of dupes, and this wastes everybody's time that would be better
spent for actually fixing things.
As a side-effect, getting the number of crashers down may have a motivating effect on
everybody. Also remember that super-reviewers these days pay attention to good code
style, indentation etc. I belive that crashing code is at least as serious as this.
Mozilla has already gone a long way. Let's make it fast as lightning and solid as a
rock, then everybody will be happy. :)
Andreas