------- Comment #4 from zackw at panix dot com 2010-04-08 17:28 ------- (In reply to comment #0) > When this testcase, using inline assembly, is compiled with -Os, -O2, or -O3 > it segfaults. -O0 and -O1 allow it to run correctly. > > Moving the inline assembly into a separate file and including it in the > compilation allow the program to run correctly at all -O levels.
>From these symptoms, it is practically certain that you have done something wrong with the asm inputs and outputs. I don't have an Alpha compiler to hand, but just from looking at your code, I bet it will work correctly if you rewrite it like so: unsigned long rewritten(const unsigned long b[2]) { unsigned long ofs, output; asm( "cmoveq %0,64,%1 # ofs = (b[0] ? ofs : 64);\n" "cmoveq %0,%2,%0 # temp = (b[0] ? b[0] : b[1]);\n" "cttz %0,%0 # output = cttz(temp);\n" : "=r" (output), "=r" (ofs) : "r" (b[1]), "0" (b[0]), "1" (0) ); return output + ofs; } (I've assumed that the semantic of "cmoveq a,b,c" is "if (a==0) c=b;") The trick with asm() is to do as little as possible. I assume that the reason the assembly version beats the pure-C version is the cmoveq's, so I stripped the setup code and the addition. This allows me to express the _real_ argument constraints rather than fake ones, which lets me be confident that the optimizers will do what you want. Note that this also means "volatile" is unnecessary. As a general principle, if you find yourself writing an asm() with a big long list of earlyclobber outputs but no inputs, you are doing it wrong. -- zackw at panix dot com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |zackw at panix dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43691