On Wed, Aug 27, 2014 at 9:07 AM, Jonas Wagner <[email protected]> wrote:
> Hi,
>
> I am investigating the performance of AddressSanitizer checks. In the
> process, I've found a few checks that look as follows:
>
>      1: 0.27 |        mov    %rax,%rcx
>      2: 0.01 |        shr    $0x3,%rcx
>      3: 0.03 |        mov    0x7fff8000(%rcx),%cl
>      4: 0.57 |        test   %cl,%cl
>      5: 0.29 |     |<<je     abe
>      6:      |     |  mov    %eax,%edx
>      7:      |     |  and    $0x7,%edx
>      8:      |     |  add    $0x3,%edx
>      9:      |     |  movsbl %cl,%ecx
>     10:      |     |  cmp    %ecx,%edx
>     11:      |     |  jge    1208
>     12: 0.25 | abe:|> movl   $0xdeadbeef,(%rax)
>
> The address being tested is in %rax. This is for a four-byte load, so there
> is a fast path and a slow path. The numbers to the left are samples from the
> perf sampling profiler, and indicate the cost of each line. From these it
> seems that the slow path is never taken.
>
> What is intriguing is that the compiler (clang in this case) places the slow
> path block in-line, thus the branch on line five is likely to be
> mispredicted. This is not the case for the branch on line 11.
>
> Is this the desired behavior? I.e., are these slow paths often taken?

This is the main question and I don't know the answer.
Most likely on some apps adding branch weights will be a loss and on
other apps it will be a gain.

Example:

#include <stdlib.h>
#include <stdio.h>
void store(int *a) {
  *a = 0;
}

int main(int argc, char **argv) {
  size_t off = argc == 1 ? 7 : atoi(argv[1]);
  int *x = new int[3];
  printf("offset: %zd\n", off);
  for (int i = 0; i < 100000000; i++)
    store(x + off);
  delete [] x;
}


This is the profile if I run "./a.out 0"

       │    00000000004b49e0 <store(int*)>:
 19.00 │      push   %rax
 31.11 │      mov    %rdi,%rax
       │      shr    $0x3,%rax
       │      mov    0x7fff8000(%rax),%al
 31.45 │      test   %al,%al
       │    ↓ je     21
       │      mov    %edi,%ecx
       │      and    $0x7,%ecx
       │      add    $0x3,%ecx
       │      movsbl %al,%eax
       │      cmp    %eax,%ecx
       │    ↓ jge    29
 14.82 │21:   movl   $0x0,(%rdi)
  0.11 │      pop    %rax
  3.51 │    ← retq


This is the profile if I run "./a.out 2"

       │    00000000004b49e0 <store(int*)>:
 17.38 │      push   %rax
  3.67 │      mov    %rdi,%rax
       │      shr    $0x3,%rax
       │      mov    0x7fff8000(%rax),%al
 17.50 │      test   %al,%al
       │    ↓ je     21
  7.34 │      mov    %edi,%ecx
       │      and    $0x7,%ecx
       │      add    $0x3,%ecx
 12.73 │      movsbl %al,%eax
  6.00 │      cmp    %eax,%ecx
       │    ↓ jge    29
       │21:   movl   $0x0,(%rdi)
       │      pop    %rax
 35.37 │    ← retq



>> Or is
> this layout desirable for other reasons?
>
> If this is not the desired behavior, is it a compiler problem? Should clang
> be able to recognize this pattern and automatically guess branch weights?
>
> If it's not a compiler problem, would it make sense to specify branch
> weights in AddressSanitizer? Currently, AddressSanitizer does not add any
> weights to the branches it creates.

The change to add weights is trivial.
I am not against it and the patch is welcome, but it will need to be
accompanied with SPEC performance numbers.

--kcc

>
> Cheers,
> Jonas
>
> --
> You received this message because you are subscribed to the Google Groups
> "address-sanitizer" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"address-sanitizer" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to