On Wed, Aug 27, 2014 at 9:07 AM, Jonas Wagner <[email protected]> wrote:
> Hi,
>
> I am investigating the performance of AddressSanitizer checks. In the
> process, I've found a few checks that look as follows:
>
> 1: 0.27 | mov %rax,%rcx
> 2: 0.01 | shr $0x3,%rcx
> 3: 0.03 | mov 0x7fff8000(%rcx),%cl
> 4: 0.57 | test %cl,%cl
> 5: 0.29 | |<<je abe
> 6: | | mov %eax,%edx
> 7: | | and $0x7,%edx
> 8: | | add $0x3,%edx
> 9: | | movsbl %cl,%ecx
> 10: | | cmp %ecx,%edx
> 11: | | jge 1208
> 12: 0.25 | abe:|> movl $0xdeadbeef,(%rax)
>
> The address being tested is in %rax. This is for a four-byte load, so there
> is a fast path and a slow path. The numbers to the left are samples from the
> perf sampling profiler, and indicate the cost of each line. From these it
> seems that the slow path is never taken.
>
> What is intriguing is that the compiler (clang in this case) places the slow
> path block in-line, thus the branch on line five is likely to be
> mispredicted. This is not the case for the branch on line 11.
>
> Is this the desired behavior? I.e., are these slow paths often taken?
This is the main question and I don't know the answer.
Most likely on some apps adding branch weights will be a loss and on
other apps it will be a gain.
Example:
#include <stdlib.h>
#include <stdio.h>
void store(int *a) {
*a = 0;
}
int main(int argc, char **argv) {
size_t off = argc == 1 ? 7 : atoi(argv[1]);
int *x = new int[3];
printf("offset: %zd\n", off);
for (int i = 0; i < 100000000; i++)
store(x + off);
delete [] x;
}
This is the profile if I run "./a.out 0"
│ 00000000004b49e0 <store(int*)>:
19.00 │ push %rax
31.11 │ mov %rdi,%rax
│ shr $0x3,%rax
│ mov 0x7fff8000(%rax),%al
31.45 │ test %al,%al
│ ↓ je 21
│ mov %edi,%ecx
│ and $0x7,%ecx
│ add $0x3,%ecx
│ movsbl %al,%eax
│ cmp %eax,%ecx
│ ↓ jge 29
14.82 │21: movl $0x0,(%rdi)
0.11 │ pop %rax
3.51 │ ← retq
This is the profile if I run "./a.out 2"
│ 00000000004b49e0 <store(int*)>:
17.38 │ push %rax
3.67 │ mov %rdi,%rax
│ shr $0x3,%rax
│ mov 0x7fff8000(%rax),%al
17.50 │ test %al,%al
│ ↓ je 21
7.34 │ mov %edi,%ecx
│ and $0x7,%ecx
│ add $0x3,%ecx
12.73 │ movsbl %al,%eax
6.00 │ cmp %eax,%ecx
│ ↓ jge 29
│21: movl $0x0,(%rdi)
│ pop %rax
35.37 │ ← retq
>> Or is
> this layout desirable for other reasons?
>
> If this is not the desired behavior, is it a compiler problem? Should clang
> be able to recognize this pattern and automatically guess branch weights?
>
> If it's not a compiler problem, would it make sense to specify branch
> weights in AddressSanitizer? Currently, AddressSanitizer does not add any
> weights to the branches it creates.
The change to add weights is trivial.
I am not against it and the patch is welcome, but it will need to be
accompanied with SPEC performance numbers.
--kcc
>
> Cheers,
> Jonas
>
> --
> You received this message because you are subscribed to the Google Groups
> "address-sanitizer" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"address-sanitizer" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.