On Mon, 30 Sep 2013, Johan Tibell wrote:

> I'm trying to use 'perf record' to find the hotspot inside a function.
> The results I'm seeing are confusing. 'annotate' claims that a simple
> 'add' instruction is the main CPU time consumer"
> 
>         0.00 :          408246:       mov    0x8(%rbp),%rsi
>                                                                0.31 :
>         40824a:       mov    0x10(%rbp),%rcx
>                                               0.00 :          40824e:
>      lea    0x4(%rcx),%rdi
>                                 0.81 :          408252:       mov
> 0x6(%rbx),%rbx
>                   0.00 :          408256:       mov
> 0x18(%rbx,%rax,8),%r8
>      --->   27.58 :          40825b:       add    $0x20,%rbp
> 
> 0.00 :          40825f:       jmpq   408338 <Main_zdwpolyzugo_info>
>                                              0.00 :          408264:
>     mov    $0x7b9609,%ebx
>                            0.00 :          408269:       add
> $0x20,%rbp
>                     0.00 :          40826d:       jmpq   *0x0(%rbp)
> 
>         5.41 :          408270:       mov    0xd(%rbx),%rax
> 
> How shall I interpret this? Is it really the following jump that is to blame?

It looks like you're noticing "skid" which means the PC returned by the 
profiling is a little beyond the problem instruction because it is hard 
for the CPU to stop in exactly the right place.

Typically the problem instruction will be a few before, not after.  I
wouldn't think an unconditional branch would have much performance impact.

What CPU do you have?  What event are you measuring?  If you have a new 
enough system you can look into using the (I think) ":p" flags to try to 
get more precise results, if you aren't already.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to