Re: [PATCH 1/8] perf callchain: Convert children list to rbtree

Frederic Weisbecker Wed, 02 Oct 2013 03:19:10 -0700

On Thu, Sep 26, 2013 at 05:58:03PM +0900, Namhyung Kim wrote:
> From: Namhyung Kim <[email protected]>
> 
> Current collapse stage has a scalability problem which can be
> reproduced easily with parallel kernel build.  This is because it
> needs to traverse every children of callchain linearly during the
> collapse/merge stage.  Convert it to rbtree reduced the overhead
> significantly.
> 
> On my 400MB perf.data file which recorded with make -j32 kernel build:
> 
>   $ time perf --no-pager report --stdio > /dev/null
> 
> before:
>   real        6m22.073s
>   user        6m18.683s
>   sys 0m0.706s
> 
> after:
>   real        0m20.780s
>   user        0m19.962s
>   sys 0m0.689s
> 
> During the perf report the overhead on append_chain_children went down
> from 96.69% to 18.16%:
> 
>   -  18.16%  perf  perf                [.] append_chain_children
>      - append_chain_children
>         - 77.48% append_chain_children
>            + 69.79% merge_chain_branch
>            - 22.96% append_chain_children
>               + 67.44% merge_chain_branch
>               + 30.15% append_chain_children
>               + 2.41% callchain_append
>            + 7.25% callchain_append
>         + 12.26% callchain_append
>         + 10.22% merge_chain_branch
>   +  11.58%  perf  perf                [.] dso__find_symbol
>   +   8.02%  perf  perf                [.] sort__comm_cmp
>   +   5.48%  perf  libc-2.17.so        [.] malloc_consolidate
> 
> Reported-by: Linus Torvalds <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Link: http://lkml.kernel.org/n/[email protected]
> Signed-off-by: Namhyung Kim <[email protected]>


Have you tested this patchset when collapsing is not used?
There are fair chances that this patchset does not only improve collapsing
but also callchain insertion in general. So it's probably a win in any case. But
still it would be nice to make sure that it's the case because we are getting
rid of collapsing anyway.

The test that could tell us about that is to run "perf report -s sym" and 
compare the
time it takes to complete before and after this patch, because "-s sym" 
shouldn't
involve collapses.

Sorting by anything that is not comm should do the trick in fact.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/8] perf callchain: Convert children list to rbtree

Reply via email to