On 02/19/2015 12:25 PM, Ramana Radhakrishnan wrote:
On Thu, Feb 19, 2015 at 9:17 AM, Marat Zakirov <m.zaki...@samsung.com> wrote:
Hi all!

During my investigation I found that GCC does not performs load/store
widening (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65088). Could you
please answer is it so? And is there any plans to make it? I also would like
to know is there any need to make load/store widening exclusively in ASan
phase just for reducing number of ASAN_CHECKS?

Example from the bug:

$ cat t2.c

int a[2];
int b[2];

int main ()
{
   b[0] = a[0];
   b[1] = a[1];
   return 0;
}

The answer is it depends. GCC can have SLP spot this in a generic form
across ports as in the example below.


AArch64 :

main:
     adrp    x0, a    // 5    *movdi_aarch64/11    [length = 4]
     add    x0, x0, :lo12:a    // 6    add_losym_di    [length = 4]
     adrp    x1, b    // 8    *movdi_aarch64/11    [length = 4]
     add    x1, x1, :lo12:b    // 9    add_losym_di    [length = 4]
     ldr    d0, [x0]    // 7    *aarch64_simd_movv2si/1    [length = 4]
     mov    w0, 0    // 15    *movsi_aarch64/4    [length = 4]
     str    d0, [x1]    // 10    *aarch64_simd_movv2si/2    [length = 4]
     ret    // 40    simple_return    [length = 4]


Or AArch32 without neon, the standard ldm peepholes / ldrd peepholes spot this.

main:
     @ args = 0, pretend = 0, frame = 0
     @ frame_needed = 0, uses_anonymous_args = 0
     @ link register save eliminated.
     movw    r2, #:lower16:a
     movw    r3, #:lower16:b
     movt    r2, #:upper16:a
     movt    r3, #:upper16:b
     ldmia    r2, {r1, r2}
     mov    r0, #0
     stmia    r3, {r1, r2}
     bx    lr


It will be interesting to see if the number of checks can be reduced
but I suspect you'll hit quite a few phase ordering issues and you'll
have quite a few variances between ports to make this work sensibly.



regards
Ramana


$ gcc t2.c -O3 -S

$ cat t2.s

...

main:
.LFB0:
         .cfi_startproc
         movl    a(%rip), %eax
         movl    %eax, b(%rip)
         movl    a+4(%rip), %eax
         movl    %eax, b+4(%rip)
         xorl    %eax, %eax
         ret
         .cfi_endproc



I will be very appreciate for your answers and thoughts.

--Marat

Thank you very much Ramana.
I also would like x86 maintainers to explain why x86 GCC didn't handle given example?

--Marat

Reply via email to