------- Comment #4 from zackw at panix dot com  2010-04-08 17:28 -------
(In reply to comment #0)
> When this testcase, using inline assembly, is compiled with -Os, -O2, or -O3
> it segfaults. -O0 and -O1 allow it to run correctly.
> 
> Moving the inline assembly into a separate file and including it in the
> compilation allow the program to run correctly at all -O levels.

>From these symptoms, it is practically certain that you have done something
wrong with the asm inputs and outputs.  I don't have an Alpha compiler to hand,
but just from looking at your code, I bet it will work correctly if you rewrite
it like so:

unsigned long rewritten(const unsigned long b[2]) {
        unsigned long ofs, output;

        asm(
                "cmoveq %0,64,%1        # ofs    = (b[0] ? ofs : 64);\n"
                "cmoveq %0,%2,%0        # temp   = (b[0] ? b[0] : b[1]);\n"
                "cttz   %0,%0           # output = cttz(temp);\n"
                : "=r" (output), "=r" (ofs)
                : "r" (b[1]), "0" (b[0]), "1" (0)
        );
        return output + ofs;
}

(I've assumed that the semantic of "cmoveq a,b,c" is "if (a==0) c=b;")

The trick with asm() is to do as little as possible.  I assume that the reason
the assembly version beats the pure-C version is the cmoveq's, so I stripped
the setup code and the addition.  This allows me to express the _real_ argument
constraints rather than fake ones, which lets me be confident that the
optimizers will do what you want.  Note that this also means "volatile" is
unnecessary.

As a general principle, if you find yourself writing an asm() with a big long
list of earlyclobber outputs but no inputs, you are doing it wrong.


-- 

zackw at panix dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |zackw at panix dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43691

Reply via email to