For dtrace-discuss, the problem mentioned here is that a DTrace
process straddling a suspend/resume will get killed because of the
deadman timer.  This affects powertop and intrstat (at the least)
because they ignore the return value of dtrace_status() and proceed to
show zeroed values for everything.

On Thu, Sep 18, 2008 at 9:00 PM, Aubrey Li <[EMAIL PROTECTED]> wrote:
> On Fri, Sep 19, 2008 at 12:20 AM, Chad Mynhier <[EMAIL PROTECTED]> wrote:
>> On Wed, Sep 17, 2008 at 9:43 PM, Aubrey Li <[EMAIL PROTECTED]> wrote:
>>>
>>> I didn't dig into the dtrace problem, just wonder is this expected?
>>> Or Is the patch just a workaround temporarily and dtrace problem
>>> will be fixed eventually?
>>
>> This is actually tickling a safety feature of dtrace, the deadman
>> timer.  There's more information here:
>> http://blogs.sun.com/jonh/entry/the_dtrace_deadman_mechanism, but it's
>> basically a mechanism to prevent dtrace from rendering the system
>> unresponsive.  It's possible that the mechanism could be modified to
>> handle cases like this, but I don't know that it would be a high
>> priority to fix it.
>>
>> I wouldn't say that the patch is just a workaround, though.  The basic
>> problem is that it's ignoring the return value of dtrace_status(), and
>> it really shouldn't be doing that, anyway.
>>
> So, all the applications which use libdtrace need this fix for suspend/resume,
> this includes intrstat/lockstat/plockstat and dtrace itself. No object from me
> to commit this patch, but I still think this issue should be fixed in dtrace,
> otherwise all the dtrace applications have to use this trick.

I'd agree that this is a bug in DTrace, that it really should be able
to handle all cases.  But I'd also argue that DTrace was designed to
handle issues like this, because dtrace_status() has a meaningful
return value.  It seems to me that the failure of intrstat and
powertop (I haven't looked a lockstat/plockstat yet) to check the
return value of dtrace_status() is a bigger bug, though.  That return
value may be indicating some problem other than the suspend/resume
problem, and that might be a problem that isn't a bug in DTrace.  If
we fix the suspend/resume deadman timer problem, we've only fixed one
of the possible problems, and these utilities might have a similar
failure mode for the other problems.  If we fix those utilities, those
failure modes go away.

Chad
_______________________________________________
dtrace-discuss mailing list
[email protected]

Reply via email to