Re: CTFE Status 2

Steven Schveighoffer via Digitalmars-d Tue, 06 Jun 2017 12:41:33 -0700

On 6/6/17 2:51 PM, H. S. Teoh via Digitalmars-d wrote:

On Tue, Jun 06, 2017 at 02:23:59PM -0400, Steven Schveighoffer via 
Digitalmars-d wrote:

On 6/6/17 12:39 PM, H. S. Teoh via Digitalmars-d wrote:

On Tue, Jun 06, 2017 at 02:03:46AM +0000, jmh530 via Digitalmars-d wrote:

On Tuesday, 6 June 2017 at 00:46:00 UTC, Stefan Koch wrote:


Time to find this: roughly 2 weeks.


Damn. That's some commitment.


2 weeks is not bad for subtle bugs in complex code like this one. In
my day job I've seen bugs that took 2 *months* to figure out. One of
them involved a rare race condition that can only be reproduced
under very specific circumstances, and it took a long time and a lot
of guesswork before a coworker and myself discovered the exact
combination that triggered the bug, thereby leading to the subtle
problem in a piece of code that looked perfectly innocuous before
then.


Oh, I've had those before. I had a race condition that reproduced
*randomly* and usually took about 2 weeks to happen, and that's by
pounding it non-stop. The result was deadlock. Any debugging after the
fact resulted in no clues.

Only way I solved it was to print out state as it was going, so I
could see what happened when the state went bad. I think it took at
least 2 cycles to find it.

This kind of stuff makes you appreciate how important avoiding race
conditions and memory corruption is.

[...]

Yeah, race conditions and memory corruption / pointer bugs are the worst
to track down.  Since the codebase I deal with is in C, there are plenty
of opportunities for slip-ups that lead to pointer bugs.  And the worst
of them are dangling pointers... you write to them, and there's no SEGV
because they point to valid memory, but that memory has been allocated
to something else now.  By the time the corruption manifests itself,
you're already long, long past the original buggy code, usually in some
completely-innocent code that you can stare at for weeks or months and
not find a single flaw.  Meanwhile the original bug randomly corrupts
different things depending on who gets the memory pointed to by the bad
pointer, making it almost impossible to reproduce. Even after you
reproduce it, you've no idea how to trace it to the original cause,
because you're long past where it happened.  And it's almost impossible
to narrow it down, because reducing the test case may make the bad
pointer corrupt something else that you don't see, so you don't know if
the bug is still happening or not.

Things like this make you *really* appreciate D features like bounds
checking and the oft-maligned but life-saving GC.

Yep, there were memory errors too. We used a proprietary tool that waslike valgrind (this was before valgrind existed) called purify to findthose. I think most of them were either double-freeing/deletingsomething (usually in a destructor that was called more than once --always set your members that you deleted to null), or freeing new'dmemory/deleting malloc'd memory. Thought I had everything, and then thehang. At first we thought it was a memory issue not caught by purify,but then we found it eventually later as I described.

I can characterize memory corruption bugs as errors that occur randomlyand can manifest in any kind of behavior. Even more nasty is sometimesthey happen in code that you didn't even touch, because it was *always*happening, but just didn't cause a bug until you changed memoryorganization around slightly. Race conditions are also generally randombut usually manifest the same way. Both are nasty to find and debug.


I don't miss those days :)

-Steve

Re: CTFE Status 2

Reply via email to