Re: i386 clang optimisation problem with stack alignment

Dimitry Andric Wed, 18 Sep 2013 14:14:22 -0700

On Sep 10, 2013, at 18:34, Tijl Coosemans <t...@freebsd.org> wrote:
> On Tue, 10 Sep 2013 18:16:01 +0200 Tijl Coosemans wrote:
>> I've attached a small test program extracted from multimedia/gstreamer-ffmpeg
>> (libavcodec/h264_cabac.c:ff_h264_init_cabac_states(H264Context *h)).
>> 
>> When you compile and run it like this on FreeBSD/i386, it results in a
>> SIGBUS:
>> 
>> % cc -o paddd paddd.c -O3 -msse2 -fPIE -fomit-frame-pointer 
>> % ./paddd
>> Bus error
>> 
>> The reason is this instruction where %esp isn't 16-byte aligned:
>> paddd   (%esp), %xmm7


Hmm, as far as I can see, the problem is related to position independent code, 
in combination with omitting the frame pointer:

$ cc -o paddd paddd.c -O3 -msse2 -fomit-frame-pointer
$ ./paddd
$ 

$ cc -o paddd paddd.c -O3 -msse2 -fPIE -fomit-frame-pointer
$ ./paddd
Bus error (core dumped)
$ 

$ cc -o paddd paddd.c -O3 -msse2 -fPIE -fno-omit-frame-pointer
$ ./paddd
$ 


>> Is this an upstream bug or is this because of local changes (to make the
>> stack 4 byte aligned by default or something)?

The 4 byte alignment on i386 changes are from upstream, but we initiated them 
after a bit of discussion (see 
http://llvm.org/viewvc/llvm-project?view=revision&revision=167632 ).

Note the problem only occurs at -O3, which enables the vectorizer, so there 
might an issue with it in combination with position independent code generation 
and omitting frame pointers.  If you check what clang passes to its cc1 stage 
with your original command line, it gives:

"/usr/bin/cc" -cc1 -triple i386-unknown-freebsd10.0 -emit-obj -disable-free 
-main-file-name paddd.c -mrelocation-model pic -pic-level 2 -pie-level 2 
-masm-verbose -mconstructor-aliases -target-cpu i486 -target-feature +sse2 -v 
-resource-dir /usr/bin/../lib/clang/3.3 -O3 -fdebug-compilation-dir 
/home/dim/bugs/paddd -ferror-limit 19 -fmessage-length 130 -mstackrealign 
-fobjc-runtime=gnustep -fobjc-default-synthesize-properties 
-fdiagnostics-show-option -fcolor-diagnostics -backend-option -vectorize-loops 
-o /tmp/paddd-zdRbKM.o -x c paddd.c

So it does pass -mstackrealign, but for some reason it isn't always effective.  
For the -fPIE -fomit-frame-pointer case, the prolog for init_states() becomes :

init_states:                            # @init_states
# BB#0:                                 # %vector.ph
        pushl   %ebp
        pushl   %ebx
        pushl   %edi
        pushl   %esi
        subl    $28, %esp
        calll   .L0$pb
.L0$pb:
        popl    %edx

If you remove -fPIE, the data is directly accessed via its (properly 16 byte 
aligned) symbol, so there is no alignment problem:

        paddd   .LCPI0_0, %xmm7

but the stack is not realigned in the prolog either:

init_states:                            # @init_states
# BB#0:                                 # %vector.ph
        pushl   %ebx
        pushl   %edi
        pushl   %esi
        movd    16(%esp), %xmm0
...

Then, if you use -fPIE, but add -fno-omit-frame-pointer:

init_states:                            # @init_states
# BB#0:                                 # %vector.ph
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        pushl   %edi
        pushl   %esi
        andl    $-16, %esp
        subl    $48, %esp
        calll   .L0$pb
.L0$pb:
        popl    %edx
.Ltmp0:

E.g., here the stack is properly realigned, and the function works fine.

In any case: yes, I think this is a bug, and we should report it upstream.  
This is a very nice test case to do so.

-Dimitry

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: i386 clang optimisation problem with stack alignment

Reply via email to