Re: [PATCH] Add inverted call graph report support to perf tool

Sam Liao Sat, 12 Mar 2011 06:59:12 -0800

On Fri, Mar 11, 2011 at 7:57 PM, Frederic Weisbecker <[email protected]> wrote:
> On Thu, Mar 10, 2011 at 10:32:43PM +0800, Sam Liao wrote:
>> On Thu, Mar 10, 2011 at 10:43 AM, Frederic Weisbecker
>> <[email protected]> wrote:
>> > On Tue, Mar 08, 2011 at 04:59:30PM +0800, Sam Liao wrote:
>> >> On Tue, Mar 8, 2011 at 2:06 AM, Frederic Weisbecker <[email protected]> 
>> >> wrote:
>> >> > So, instead of having such temporary copy, could you rather feed the 
>> >> > callchain
>> >> > into the cursor in reverse from perf_session__resolve_callchain() ?
>> >> >
>> >> > You can keep the common part inside the loop into a seperate helper
>> >> > but have two different kinds of loops.
>> >>
>> >> In perf_session__resolve_callchain, only the callchain itself can be 
>> >> reversed,
>> >> which means the root of report will still be the ip of the event with a 
>> >> reversed
>> >> call chain sub tree. But what is more impressive to user is to make 
>> >> "main" like
>> >> function to be the root of the report, and this means that both the ip
>> >> and call chain is
>> >> involved to the reversion process.
>> >>
>> >> Since the ip of event is resolved in event__preprocess_sample, so it is 
>> >> kind
>> >> hard to do such reversion in a better way.
>> >
>> > You are making an interesting point.
>> >
>> > My view of this feature was limited to the current per hist area: having
>> > the callchains on top of hists that can be sorted per ip, dso, pid, etc...
>> > like we have today basically. So my view was for this reverse callchain
>> > to show us one callers profiling for each hist entry.
>> >
>> > But your idea of turning the callee into the caller would show us a very 
>> > global
>> > profiling. With reverse callchains it can be a very nice overview of the 
>> > big picture.
>> >
>> > IMO both workflow can be interesting:
>> >
>> > 1) Have a big reversed callchain overview, with one root per entrypoint. 
>> > This
>> > what you wanted.
>> > 2) Have a per hist 1)  which means a per hist per entrypoint callchain
>> >
>> > 1) involves reverting both callchains and ip <->caller whereas 2) only 
>> > involves
>> > reverting the callchain.
>>
>> Having both workflow included would be more helpful.
>
> That's the point, we should be able to do both. But only 1) is possible with
> your initial proposition.
>
>> >
>> > In order to get both features with a maximum flexibility and keep that 
>> > extendable, I
>> > would suggest to decouple that in two independant parts:
>> >
>> >        - an option to get reversed callchains. Using the -g option and 
>> > caller/callee
>> >        as a third argument.
>> >
>>
>> This could be easily extended by reversing the callchain symbols as
>> you mentioned.
>
> Yeah. -g caller only requires to iterate the callchain in reverse.
>
>> >        - a new "caller" sort entry. What defines a hist entry is a set of 
>> > sort
>> >        entries: dso, symbol, pid, comm, ... That we use with the -s option 
>> > in perf report.
>> >        If you want one hist per entrypoint, we could add a new "caller" 
>> > sort entry.
>> >        Then perf report -s caller will (roughly) produce one hist for 
>> > main(), one hist
>> >        for kernel_thread(), etc...
>> >
>>
>> I'm not sure adding a "caller" sort entry can get things done. As for
>> my limited understanding,
>> "sort" is kind way to group events
>
> This is actually _what_ group events. This defines how hist entries are
> built.
>
> If you do "perf report -s sym", events will be grouped by symbols.
> Thus if you had thousands events but all of them only hit sym1 and sym2
> then you'll see two groups in your histogram.
>
> Look:
>
> # ./perf report -s sym --stdio
> # Events: 4  cycles
> #
> # Overhead             Symbol
> # ........  .................
> #
>    36.72%  [.] hex2u64
>    31.21%  [k] __lock_acquire
>    18.03%  [k] lock_acquire
>    14.04%  [k] sub_preempt_count
>
> We may have got thousand events for the above profile. But only 4 symbols
> were hit in amongst these thousand events. As we asked for, events have been
> grouped per symbol target.
>
> Callchains follow this grouping scheme. Below the __lock_acquire hist,
> you would only get callchains for which the root (deepest callee) was 
> __lock_acquire.
>
> If you have several grouping, like -s sym, dso, pid
> then it computes an intersection. Events will be grouped when their
> sym, dso and pid are equal. Moreoever they will be sorted, first dimension
> per sym, second dimension per dso, third dimension per pid.
>
> You should play a bit with different combinations to get the whole picture
> and how it works.
>
> Callchains still follow the grouping, as elaborated as it can be. For the hist
> that has sym1, dso2 and pid 3, you'll find only callchains that start from 
> sym1
> for events that happened on dso2 and pid3.
>
>
> , after we group all the events
>> under "main" or "kernel_thread",
>> the sub-trees will still rooted as ip entry points with a reversed
>> call-chain sub-trees which seems
>> just the same as the previous workflow. Am I right? If so, here we
>> still have to revert the ip and
>> callchain.
>
> No. The callchain will follow that grouping. If you group only per caller
> (-s caller) you may have one hist entry for main and another for 
> kernel_thread.
> Then below the main entry, you'll have only callchains starting
> from main. And below the kernel_thread, only callchains starting from 
> kernel_thread.
>
> It depends if you select reverse callchain or not:
>
> $ perf report -s caller
>
> That will report main and kernel_thread as hists, and regular callee -> 
> caller callchains.
> Hence under main hist, you'll a lot of callchain starting from random points 
> and all
> ending in main!
>
> $ perf report -s caller -g caller
>
> That will report main and kernel_thread as hists, with callchains starting 
> from
> main under main.
>
> It becomes interesting when you want more granularity with -s caller,dso if 
> we bring a way
> to push forward the entrypoint one day. I suspect even more sorting 
> combinations are
> going to be interesting.
>


Thanks for clarification. I'll try to come up with patches as you talked.

-Sam
--
To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Add inverted call graph report support to perf tool

Reply via email to