On Mon, Apr 21, 2014 at 10:33 AM, Yuri Gribov <[email protected]> wrote:
>>> So even though my implementation is slightly faster we're still
>>> getting a 70% perf hit.
>> interesting.
>>
>> can you show the assembly (objdump -d) for __asan_load8 in both variants?
>
> My disas:
>
> 00000000004cf6a0 <__asan_load8>:
>   4cf6a0:       48 89 f8                mov    %rdi,%rax
>   4cf6a3:       48 c1 e8 03             shr    $0x3,%rax
>   4cf6a7:       80 b8 00 80 ff 7f 00    cmpb   $0x0,0x7fff8000(%rax)
>   4cf6ae:       75 08                   jne    4cf6b8 <__asan_load8+0x18>
>   4cf6b0:       f3 c3                   repz retq
>   4cf6b2:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
>   4cf6b8:       e9 f3 55 fc ff          jmpq   494cb0 <__asan_report_load8>
>   4cf6bd:       0f 1f 00                nopl   (%rax)
>

You call __asan_report_load8 directly.
This will only work if __asan_load8 does not have its own frame.
While this is true in you disasm, I am not sure we can rely on it,
so I extract pc/bp and call __asan_report_error.


> And here's the trunk version:

My version lacked the UNLIKELY trick. fixed.

>
> 0000000000493b00 <__asan_load8>:
>   493b00:       48 89 f8                mov    %rdi,%rax
>   493b03:       48 c1 e8 03             shr    $0x3,%rax
>   493b07:       80 b8 00 80 ff 7f 00    cmpb   $0x0,0x7fff8000(%rax)
>   493b0e:       74 40                   je     493b50 <__asan_load8+0x50>
>   493b10:       48 8b 05 51 83 26 00    mov    0x268351(%rip),%rax
>    # 6fbe68 <_DYNAMIC+0x13a0>
>   493b17:       48 8b 00                mov    (%rax),%rax
>   493b1a:       48 85 c0                test   %rax,%rax
>   493b1d:       74 09                   je     493b28 <__asan_load8+0x28>
>   493b1f:       48 89 38                mov    %rdi,(%rax)
>   493b22:       c3                      retq
>   493b23:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
>   493b28:       55                      push   %rbp
>   493b29:       48 89 f9                mov    %rdi,%rcx
>   493b2c:       41 b9 08 00 00 00       mov    $0x8,%r9d
>   493b32:       45 31 c0                xor    %r8d,%r8d
>   493b35:       48 89 e5                mov    %rsp,%rbp
>   493b38:       48 83 ec 10             sub    $0x10,%rsp
>   493b3c:       48 8b 7d 08             mov    0x8(%rbp),%rdi
>   493b40:       48 8d 55 f8             lea    -0x8(%rbp),%rdx
>   493b44:       48 89 ee                mov    %rbp,%rsi
>   493b47:       e8 64 e3 ff ff          callq  491eb0 <__asan_report_error>
>   493b4c:       c9                      leaveq
>   493b4d:       0f 1f 00                nopl   (%rax)
>   493b50:       f3 c3                   repz retq
>   493b52:       66 66 66 66 66 2e 0f    data32 data32 data32 data32
> nopw %cs:0x0(%rax,%rax,1)
>   493b59:       1f 84 00 00 00 00 00
>
>> If you want to rely on a custom ABI, you should implement in on both
>> callee and caller sides.
>
> Sure.
>
>> That might indeed improve the speed, but imho is not worth it here.
>
> I still think that most of the overhead comes from ABI overheads (IMHO
> x86/amd64 are particularly bad at this). E.g. removing _all_ code from
> callbacks results in 16 sec runtime (so callback code overhead is only
> (17.3 - 16)/(17.3 - 11) =  20%) so improving it further is practically
> worthless.
>
> -Y
>
> --
> You received this message because you are subscribed to the Google Groups 
> "address-sanitizer" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"address-sanitizer" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to