http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50693
--- Comment #10 from David Edelsohn <dje at gcc dot gnu.org> 2011-10-11
01:35:20 UTC ---
Sorry, I was looking at the loop1 and loop2 functions, not the code inlined
into the benchmark for main.
LLVM generates:
movq %r12, %rdi
movl $99, %esi
movq %rbx, %rdx
callq memset
GCC vectorizes loop1:
.L22:
addq $1, %rdx
movdqa %xmm0, (%rcx)
addq $16, %rcx
cmpq %rsi, %rdx
jb .L22
but not loop2:
.L28:
.L29:
movb $99, (%rbx,%rax)
addq $1, %rax
cmpq %rbp, %rax
jne .L28