This looks like a valgrind bug to me.

I can reproduce the problem with this simple program, which shows the
issue at any optimisation level.

int main ()
{
 asm volatile ("" : : : "r4", "r5");
 return 0;
}

[on my raspberry pi, with the system gcc]
$ gcc test.c -mtune=cortex-a15 -marm
$ valgrind ./a.out
==15850== Memcheck, a memory error detector
==15850== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==15850== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==15850== Command: ./a.out
==15850==
==15850== Invalid write of size 4
==15850==    at 0x103E8: main (in /home/cgb23/a.out)
==15850==  Address 0xbdcf34a4 is just below the stack ptr.  To
suppress, use: --workaround-gcc296-bugs=yes
...

000103e8 <main>:
   103e8:       e16d40fc        strd    r4, [sp, #-12]!
   103ec:       e58db008        str     fp, [sp, #8]
   103f0:       e28db008        add     fp, sp, #8
   103f4:       e3a03000        mov     r3, #0
   103f8:       e1a00003        mov     r0, r3
   103fc:       e24bd008        sub     sp, fp, #8
   10400:       e1cd40d0        ldrd    r4, [sp]
   10404:       e59db008        ldr     fp, [sp, #8]
   10408:       e28dd00c        add     sp, sp, #12
   1040c:       e12fff1e        bx      lr

Without looking at the valgrind sources, I'd guess that valgrind isn't
handling the strd instruction correctly. "size 4" obviously isn't
correct for the strd, and it also may not be accounting for the
writeback of the stack pointer correctly. Looking at google, I found
this bug report to the valgrind mailing list:
https://sourceforge.net/p/valgrind/mailman/message/34632852/. It seems
to relate to the same issue, but did not attract any attention. A
brief look at the attached patch suggests that the problem is related
to the way valgrind handles writes to the stack with negative offsets
and writeback.

The suggested --workaround-gcc296-bugs=yes option does seem to
suppress the error. Alternatively, since the compiler will only use
STRD/LDRD in the prologue and epilogue when compiling for cores with
an out-of-order microarchitecture, you can workaround the problem by
compiling with -mcpu=cortex-a7, in which case it will use PUSH and POP
instead



On 9 June 2016 at 22:22, William Mills <wmi...@ti.com> wrote:
> Hello,
>
> We have been using Linaro GCC 5.x[1] and valgrind.
>
> When the optimizer is turned on valgrind complains about writes beyond
> the current stack pointer.  With the optimizer off, the problem report
> goes away.
>
> I have my own conclusion about what is going on but I won't bias you
> with it.  Here are the facts:
>
> All files and logs attached as 10K tar.gz if it survives this maillist.
>
> test.c:
> #include <stdio.h>
>
> int  main(int argc,char** argv)
> {
>         int i;
>
>         for (i = 1; i < argc; i++) {
>                 printf("argument is %s\n", argv[i]);
>        }
>
>        return 0;
> }
>
> $ arm-linux-gnueabihf-gcc -march=armv7ve -marm -mfpu=neon  \
>   -mfloat-abi=hard -mcpu=cortex-a15 -O2 -g \
>   -o test-fail test.c
>
>
> $ valgrind --leak-resolution=high --track-origins=yes \
> --trace-children=yes --leak-check=full --error-limit=no \
>  ./test-fail arg1 arg2 arg3
>
> ==20011== Memcheck, a memory error detector
> ==20011== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
> ==20011== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
> ==20011== Command: ./test-fail arg1 arg2 arg3
> ==20011==
> ==20011== Invalid write of size 4
> ==20011==    at 0x10300: main (test.c:4)
> ==20011==  Address 0xbdbfcb58 is on thread 1's stack
> ==20011==  24 bytes below stack pointer
> ==20011==
>
> 000102f8 <main>:
>    102f8:       e3500001        cmp     r0, #1
>    102fc:       da000014        ble     10354 <main+0x5c>
>    10300:       e16d41f8        strd    r4, [sp, #-24]! ; 0xffffffe8
> ^^^^^^^^  Complaint is here
>
>    10304:       e1a05001        mov     r5, r1
>    10308:       e3a04001        mov     r4, #1
>    1030c:       e1cd60f8        strd    r6, [sp, #8]
>    10310:       e300748c        movw    r7, #1164       ; 0x48c
>    10314:       e1a06000        mov     r6, r0
>    10318:       e3407001        movt    r7, #1
>    1031c:       e58d8010        str     r8, [sp, #16]
>    10320:       e58de014        str     lr, [sp, #20]
>    10324:       e2844001        add     r4, r4, #1
>    10328:       e5b51004        ldr     r1, [r5, #4]!
>    1032c:       e1a00007        mov     r0, r7
>    10330:       ebffffe4        bl      102c8 <printf@plt>
>    10334:       e1560004        cmp     r6, r4
>    10338:       1afffff9        bne     10324 <main+0x2c>
>    1033c:       e1cd40d0        ldrd    r4, [sp]
>    10340:       e3a00000        mov     r0, #0
>    10344:       e1cd60d8        ldrd    r6, [sp, #8]
>    10348:       e59d8010        ldr     r8, [sp, #16]
>    1034c:       e28dd014        add     sp, sp, #20
>    10350:       e49df004        pop     {pc}            ; (ldr pc, [sp], #4)
>    10354:       e3a00000        mov     r0, #0
>    10358:       e12fff1e        bx      lr
>
> Without the optimizer, the code looks different and valgrind does not
> issue any errors.
>
> 000103d8 <main>:
>    103d8:       e52db008        str     fp, [sp, #-8]!
> ^^^^^^^ Valgrind does not complain about this
>
>    103dc:       e58de004        str     lr, [sp, #4]
>    103e0:       e28db004        add     fp, sp, #4
>    103e4:       e24dd010        sub     sp, sp, #16
>    103e8:       e50b0010        str     r0, [fp, #-16]
>    103ec:       e50b1014        str     r1, [fp, #-20]  ; 0xffffffec
>    103f0:       e3a03001        mov     r3, #1
>    103f4:       e50b3008        str     r3, [fp, #-8]
>    103f8:       ea00000b        b       1042c <main+0x54>
>    103fc:       e51b3008        ldr     r3, [fp, #-8]
>    10400:       e1a03103        lsl     r3, r3, #2
>    10404:       e51b2014        ldr     r2, [fp, #-20]  ; 0xffffffec
>    10408:       e0823003        add     r3, r2, r3
>    1040c:       e5933000        ldr     r3, [r3]
>    10410:       e1a01003        mov     r1, r3
>    10414:       e30004a4        movw    r0, #1188       ; 0x4a4
>    10418:       e3400001        movt    r0, #1
>    1041c:       ebffffa9        bl      102c8 <printf@plt>
>    10420:       e51b3008        ldr     r3, [fp, #-8]
>    10424:       e2833001        add     r3, r3, #1
>    10428:       e50b3008        str     r3, [fp, #-8]
>    1042c:       e51b2008        ldr     r2, [fp, #-8]
>    10430:       e51b3010        ldr     r3, [fp, #-16]
>    10434:       e1520003        cmp     r2, r3
>    10438:       baffffef        blt     103fc <main+0x24>
>    1043c:       e3a03000        mov     r3, #0
>    10440:       e1a00003        mov     r0, r3
>    10444:       e24bd004        sub     sp, fp, #4
>    10448:       e59db000        ldr     fp, [sp]
>    1044c:       e28dd004        add     sp, sp, #4
>    10450:       e49df004        pop     {pc}            ; (ldr pc, [sp], #4)
>
>
> [1] 5.3-2016.02 for Yocto-project and cross-compile
> 5.2 on the ARM target "since Linaro hasn’t yet fixed building 5.3 from
> recipes yet."
> Both versions give the same results for this test program.
>
> ----------------
> William A. Mills
> Chief Technologist, Open Solutions, SDO
> Texas Instruments, Inc.
> 20450 Century Blvd
> Germantown MD 20878
> 240-643-0836
>
> _______________________________________________
> linaro-toolchain mailing list
> linaro-toolchain@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-toolchain
>
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to