On Mon, Oct 17, 2011 at 06:07:00PM -0700, Arun Sharma wrote: > On 10/15/11 12:29 PM, Peter Zijlstra wrote: > >On Sat, 2011-10-15 at 21:22 +0200, Peter Zijlstra wrote: > > > >>>Sleep time should really just be a different notion of 'cost of the > >>>function/callchain' and fit into the existing scheme, right? > >> > >>The problem with andrew's patches is that it wrecks the callchain > >>semantics. The waittime tracepoint is in the wakeup path (and hence > >>generates the wakee's callchain) whereas they really want the callchain > >>of the woken task to show where it spend time. > > > >We could of course try to move the tracepoint into the schedule path, so > >we issue it the first time the task gets scheduled after the wakeup, but > >I suspect that will just add more overhead, and we really could do > >without that. > > Do we need to define new tracepoints? I suspect we could make the > existing ones: > > trace_sched_stat_wait() > trace_sched_stat_sleep() > > work for this purpose. The length of time the task was not on the > cpu could then be computed as: sleep+wait. The downside is that the > complexity moves to user space. > > perf record -e sched:sched_stat_sleep,sched:sched_stat_wait,... > > Re: changing the semantics of tracepoint callchains > > Yeah - this could be surprising. Luckily, most tracepoints retain > their semantics, but a few special ones don't. I guess we just need > to document the new behavior.
That's not only a problem of semantics although that alone is a problem, people will seldom read the documentation for corner cases, we should really stay consistant here: if remote callchains are really needed, we want a specific interface for that, not abusing the existing one that would only confuse people. Now I still think doing remote callchains is asking for troubles: we need to ensure the target is really sleeping and is not going to be scheduled concurrently otherwise you might get weird or stale results. So the user needs to know which tracepoints are safe to perform this. Then comes the problem to deal with remote callchains in userspace: the event comes from a task but the callchain is from another. You need the perf tools to handle remote dsos/mapping/sym etc... That's a lot of unnecessary complications. I think we should use something like a perf report plugin: perhaps something that can create a virtual event on top of real ones: compute the sched:sched_switch events, find the time tasks are sleeping and create virtual sleep events on top of that with a period weighted with the sleep time. Just a thought. -- To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html