Martin Cerveny wrote:
DTrace does allow for some very unobtrusive
observation. Enabling every
fbt::: probe is always going to have a measurable
effect on performance.
Rather, I suspect a latent race condition, the window
for which is
opened once the code has been slowed down by the
sledge hammer of
enabling every fbt probe.
Ok, this should be some race condition.
I get the "error" even with 61 probes enabled only in PCFS (fired 33146 per 10 mkdir iterations in test line).
dtrace -Fn 'syscall::mkdir*:entry { self->trc=1 } syscall::mkdir*:return {
self->trc=0 } fbt:pcfs:pc_*:return /self->trc/ { } ' > /dev/null
I noticed very funny fact if I changedfbt:pcfs:pc_*:return -> changedfbt:pcfs:pc_*:entry
removes the "error".
I am very unhappy :-(
Welcome to the joys of parallel programming! Sounds like you've got a
class "A" Heisenbug on your hands (see, e.g.
http://catb.org/jargon/html/H/heisenbug.html).
Seriously, I wouldn't blame this on DTrace. I've had Heisenbugs exactly
like you describe, and they're a royal pain precisely because the
slightest change can open or close timing windows. The worst was the
time I declared a local variable before a problematic section of code,
assigned hopefully-helpful constants at various points, then sat back
and waited for the seg fault... which never came. Adding even 2-3 extra
instructions to any one path through the code of interest was too much.
This would be like blaming a debug malloc implementation for failing to
catch (or worsening the effects of) a bad pointer access which clobbers
malloc's internal state. All bets are off at that point. If anything, be
happy that there's a relatively painless way to narrow down the bug's
hiding place (by varying which probes are enabled and seeing what
happens). The chances of a timely fix are exponentially proportional to
the ease of reproducing the problem...
Regards,
Ryan
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org