On 07.31, Gwenole Beauchesne wrote:
> On Thu, 31 Jul 2003, J.A. Magallon wrote:
> 
> > a) With pentium3, gcc spits sse instructions for moves, like movaps.
> >    Sure it should be doing this ? If I build with -mno-sse, my code
works
> >    again. But -msse should only activate/deactivate the use of
> >    __builtin_movaps() and friends. I am also developing the use of SSE
> >    for my vector math, so I _need_ -msse. How can I just enable the
builtins,
> >    but prevent gcc to spit its own sse instructions ?
> 
> Indeed, doco states that enables intrinsics. But I'd believe this changed 
> over time. Please, can I get a testcase of that one too?
> 
> > b) Automatic use of SSE would be ok if gcc is sure arguments are
aligned,
> >    or if it uses unaligned moves. But I have seen gcc do things like:
> > 
> >    movups  %xmm0, -120(%ebp)
> > 
> >    Who knows how is that aligned ? Even if stack is 16-byte aligned,
> >    120 is not a multiple of 16... Changing that to movups and compiling
> >    the assembler, things work. So obviously it was not aligned.
> 
> Can you please provide me with an appropriate testcase?
> 

I got it !! I have killed al the const& and optimized code and gcc still
generates a movaps.

// bug.cc

class Vector
{
public:
        float f[4];
        Vector()
        {
                f[0]=f[1]=f[2]=f[3]=0;
        };
        Vector(float a,float b,float c,float d)
        {   
                f[0]=a; f[1]=b; f[2]=c; f[3]=d;
        };
        Vector operator-() const
        {
                return Vector(-f[0],-f[1],-f[2],-f[3]);
        };   
};

class Ray
{
public:
        Vector o,d;
};

class Env
{
public:
        Vector e,v,n,ng;
        void set(const Ray& r);
};

Vector fn()
{
        return Vector(0,0,0,0);
}

void Env::set(const Ray& r)
{
        e = r.o;
        v = -r.d;
        n = fn();
        ng = -n;
}

Build with

g++ -O2 -march=pentium3 -S bug.cc

and look at the code in Env::set():

...
.LC1:
    .long   -2147483648
    .long   0
    .long   0
    .long   0
    .text
    .align 2
    .p2align 4,,15
.globl _ZN3Env3setERK3Ray
    .type   _ZN3Env3setERK3Ray, @function
...
    movss   .LC1, %xmm0
... 
    movaps  %xmm0, -56(%ebp)

It looks as, when gcc has to reuse a float -1.0, stores it from data segment
in an xmm register, and then puts that back onto stack with an aligned
sse move.

Now that I know whats going on, I have to find a solution apart from
building with -O1...finding the individual -foptmization that cuses this ?

Hope this helps.

Would you post the bug in gcc ? If so, plz add me to the CCs. If not, I will
do it.

TIA

BTW: is there any #pragma or anything similar to say "this file must be
compiled at -O1" ?

-- 
J.A. Magallon <[EMAIL PROTECTED]>      \                 Software is like
sex:
werewolf.able.es                         \           It's better when it's
free
Mandrake Linux release 9.2 (Cooker) for i586
Linux 2.4.22-pre9-jam1m (gcc 3.3.1 (Mandrake Linux 9.2 3.3.1-0.7mdk))

Reply via email to