Martin Cerveny wrote:
DTrace does allow for some very unobtrusive
observation. Enabling every fbt::: probe is always going to have a measurable
effect on performance.

Rather, I suspect a latent race condition, the window
for which is opened once the code has been slowed down by the sledge hammer of enabling every fbt probe.

Ok, this should be some race condition. I get the "error" even with 61 probes enabled only in PCFS (fired 33146 per 10 mkdir iterations in test line).

dtrace -Fn 'syscall::mkdir*:entry { self->trc=1 } syscall::mkdir*:return { 
self->trc=0 } fbt:pcfs:pc_*:return /self->trc/ {  } '  > /dev/null

I noticed very funny fact if I changedfbt:pcfs:pc_*:return -> changedfbt:pcfs:pc_*:entry 
removes the "error".
I am very unhappy :-(
Welcome to the joys of parallel programming! Sounds like you've got a class "A" Heisenbug on your hands (see, e.g. http://catb.org/jargon/html/H/heisenbug.html).

Seriously, I wouldn't blame this on DTrace. I've had Heisenbugs exactly like you describe, and they're a royal pain precisely because the slightest change can open or close timing windows. The worst was the time I declared a local variable before a problematic section of code, assigned hopefully-helpful constants at various points, then sat back and waited for the seg fault... which never came. Adding even 2-3 extra instructions to any one path through the code of interest was too much.

This would be like blaming a debug malloc implementation for failing to catch (or worsening the effects of) a bad pointer access which clobbers malloc's internal state. All bets are off at that point. If anything, be happy that there's a relatively painless way to narrow down the bug's hiding place (by varying which probes are enabled and seeing what happens). The chances of a timely fix are exponentially proportional to the ease of reproducing the problem...

Regards,
Ryan

_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Reply via email to