Re: [dmd-internals] Asserts

David Held Sat, 10 Nov 2012 11:24:44 -0800

On 11/9/2012 11:38 PM, Walter Bright wrote:

[...]
I'll often use printf because although the debugger knows types, itrarely shows the values in a form I particularly need to track down aproblem, which tends to be different every time. And besides, throwingin a few printfs is fast and easy, whereas setting break points andstepping through a program is an awfully tedious process. Or maybe Inever learned to use debugger properly, which is possible since I'vebeen forced to debug systems where no debugger whatsoever wasavailable - I've even debugged programs using an oscilloscope, makingclicks on a speaker, blinking an LED, whatever is available.

You're making my point for me, Walter! I have seen some people whizthrough the debugger like they live in it, but I would say that level offamiliarity tends to be the exception, rather than the rule. And, italways makes me a little uncomfortable when I see it (why would someone*need* to be that proficient with the debugger...?). Firing up thedebugger, for many people, is a relatively expensive process, because itisn't something that good programmers should be doing very often (unlessyou subscribe to the school which says that you should always stepthrough new code in the debugger...consider this an alternative towriting unit tests).

Note that getting a call stack for a seg fault does not suffer fromthese problems. I just:
   gdb --args dmd foo.d

and whammo, I got my stack trace, complete with files and line numbers.

There are two issues here. 1) Bugs which don't manifest as a segfault.2) Bugs in which a segfault is the manifestation, but the root cause isfar away (i.e.: not even in the call stack). I will say more on this below.

[...]
Especially when there may be hundreds of instances running, whileonly a few actually experience a problem, logging usually turns outto be the better choice. Then consider that logging is also moreuseful for bug reporting, as well as visualizing the code flow evenin non-error cases.
Sure, but that doesn't apply to dmd. What's best practice for one kindof program isn't for another.

There are many times when a command-line program offers logging of somesort which has helped me identify a problem (often a configuration erroron my part). Some obvious examples are command shell scripts (which, bydefault, simply tell you everything they are doing...both annoying anduseful) and makefiles (large build systems with hundreds of makefilesalmost always require a verbose mode to help debug a badly writtenmakefile).

Also, note that when I am debugging a service, I am usually using it ina style which is equivalent to dmd. That is, I get a repro case, I sendit in to a standalone instance, I look the response and the logs. Thisis really no different from invoking dmd on a repro case. Even in thisscenario, logs are incredibly useful because they tell me theapproximate location where something went wrong. Sometimes, this isenough to go look in the source and spot the error, and other times, Ihave to attach a debugger. But even when I have to go to the debugger,the logs let me skip 90% of the single-stepping I might otherwise haveto do (because they tell me where things *probably worked correctly*).

[...]
I've tried that (see the LOG macros in template.c). It doesn't workvery well, because the logging data tends to be far too voluminous. Ilike to tailor it to each specific problem. It's faster for me, andworks.

The problem is not that a logging system doesn't work very well, butthat a logging system without a configuration system is not first-class,and *that* is what doesn't work very well. If you had something likelog4j available, you would be able to tailor the output to somethingmanageable. An all-or-nothing log is definitely too much data when youturn it on.


On 11/9/2012 11:44 PM, Walter Bright wrote:

[...]
There is some async code in there. If I suspect a problem with it,I've left in the single thread logic, and switch to that in order tomake it deterministic.

But that doesn't tell you what the problem is. It just lets you escapeto something functional by giving up on the parallelism. Logs at leasttell you the running state in the parallel case, which is often enoughto guess at what is wrong. Trying to find a synchronization bug inparallel code is pretty darned difficult in a debugger (for what I hopeare obvious reasons).

[...]
Actually, very very few bugs manifest themselves as seg faults. Imentioned before that I regard the emphasis on NULL pointers to bewildly excessive.

I would like to define a metric, which I call "bug depth". Suppose thatincorrect program behavior is noticed, and bad behavior is associatedwith some symbol, S. Now, it could be that there is a problem with theimmediate computation of S, whatever that might be (I mean, like in thesame lexical scope). Or, it could be that S is merely a victim of a badcomputation somewhere else (i.e.: the computation of S received a badinput from some other computation). Let us call the bad input S'. Now,it again may be the case that S' is a first-order bad actor, or that itis the victim of a bug earlier in the computation, say, from S''. Letus call the root cause symbol R. Now, there is some trail ofdependencies from R to S which explain the manifestation of the bug.And let us call the number of references which must be followed from Sto R the "bug depth".

Now that we have this metric, we can talk about "shallow" bugs and"deep" bugs. When a segfault is caused by code immediately surroundingthe bad symbol, we can say that the bug causing the segfault is"shallow". And when it is caused by a problem, say, 5 function callsaway, in non-trivial functions, it is probably fair to say that the bugis "deep". In my experience, shallow bugs are usually simple mistakes.A programmer failed to check a boundary condition due to laziness, theyused the wrong operator, they transposed some symbols, they re-used avariable they shouldn't have, etc. And you know they are simplemistakes when you can show the offending code to any programmer(including ones who don't know the context), and they can spot the bug.These kinds of bugs are easy to identify and fix.

The real problem is when you look at the code where something isfailing, and there is no obvious explanation for the failure. Ok, maybebeing able to see the state a few frames up the stack will expose theroot cause. When this happens, happy day! It's not the shallowest bug,but the stack is the next easiest context in which to look for rootcauses. The worst kinds of bugs happen when *everyone thinks they didthe right thing*, and what really happened is that two coders disagreedon some program invariant. This is the kind of bug which tends to takethe longest to figure out, because most of the code and program statelooks the way everyone expects it to look. And when you finallydiscover the problem, it isn't a 1-line fix, because an entire modulehas been written with this bad assumption, or the code does somethingfairly complicated that can't be changed easily.

There are several ways to defend against these types of bugs, all ofwhich have a cost. There's the formal route, where you specify allvalid inputs and outputs for each function (as documentation). There'sthe testing route, where you write unit tests for each function. Andthere's the contract-based route, where you define invariants checked atruntime. In fact, all 3 are valuable, but the return on investment foreach one depends on the scale of the program.

Although I think good documentation is essential for a multi-coderproject, I would probably do that last. In fact, the technique which isthe cheapest but most effective is to simply assert all your invariantsinside your functions. Yes, this includes things you think are silly,like checking for NULL pointers. But it also includes things which areless silly, like checking for empty strings, empty containers, and otherinput assumptions which occur. It's essentially an argument forcontract-based programming. D has this feature in the language. It isironic that it is virtually absent from the compiler itself. There areprobably more assert(0) in the code than any other assert.

DMD has a fair number of open bugs left, and if I had to guess, the easyones have already been cherry-picked. That means the remainders are farmore likely to be deep bugs rather than shallow ones. And the only wayI know how to attack deep bugs (both proactively and reactively) is tostart making assumptions explicit (via assertions, exceptions,documentation), and give the people debugging a visualization of what ishappening in the program via logs/debug output. Often times, a log filewill show patterns that give you a fuzzy, imprecise sense of what ishappening that is still useful, because when a bug shows up, it disruptsthe pattern in some obvious way. This is what I mean by "visualizingthe flow". It's being able to step back from the bark-staring which issingle-stepping, and trying to look at a stand of trees in the forest.


Dave

_______________________________________________
dmd-internals mailing list
[email protected]
http://lists.puremagic.com/mailman/listinfo/dmd-internals

Re: [dmd-internals] Asserts

Reply via email to