For dtrace-discuss, the problem mentioned here is that a DTrace process straddling a suspend/resume will get killed because of the deadman timer. This affects powertop and intrstat (at the least) because they ignore the return value of dtrace_status() and proceed to show zeroed values for everything.
On Thu, Sep 18, 2008 at 9:00 PM, Aubrey Li <[EMAIL PROTECTED]> wrote: > On Fri, Sep 19, 2008 at 12:20 AM, Chad Mynhier <[EMAIL PROTECTED]> wrote: >> On Wed, Sep 17, 2008 at 9:43 PM, Aubrey Li <[EMAIL PROTECTED]> wrote: >>> >>> I didn't dig into the dtrace problem, just wonder is this expected? >>> Or Is the patch just a workaround temporarily and dtrace problem >>> will be fixed eventually? >> >> This is actually tickling a safety feature of dtrace, the deadman >> timer. There's more information here: >> http://blogs.sun.com/jonh/entry/the_dtrace_deadman_mechanism, but it's >> basically a mechanism to prevent dtrace from rendering the system >> unresponsive. It's possible that the mechanism could be modified to >> handle cases like this, but I don't know that it would be a high >> priority to fix it. >> >> I wouldn't say that the patch is just a workaround, though. The basic >> problem is that it's ignoring the return value of dtrace_status(), and >> it really shouldn't be doing that, anyway. >> > So, all the applications which use libdtrace need this fix for suspend/resume, > this includes intrstat/lockstat/plockstat and dtrace itself. No object from me > to commit this patch, but I still think this issue should be fixed in dtrace, > otherwise all the dtrace applications have to use this trick. I'd agree that this is a bug in DTrace, that it really should be able to handle all cases. But I'd also argue that DTrace was designed to handle issues like this, because dtrace_status() has a meaningful return value. It seems to me that the failure of intrstat and powertop (I haven't looked a lockstat/plockstat yet) to check the return value of dtrace_status() is a bigger bug, though. That return value may be indicating some problem other than the suspend/resume problem, and that might be a problem that isn't a bug in DTrace. If we fix the suspend/resume deadman timer problem, we've only fixed one of the possible problems, and these utilities might have a similar failure mode for the other problems. If we fix those utilities, those failure modes go away. Chad _______________________________________________ dtrace-discuss mailing list [email protected]
