I have also posted this in gnu.gcc.help but no replies there. We have a very large library that is compiled on different platforms and with different compilers. A few of the routines need to be carefully coded in assembler on the x86 platform - they make careful use of 80-bit extended precision floating point arithmetic. Because this is part of a larger package that is compiled with different compilers (MS and GNU are not the only compilers used), we are using inline assembly for the few functions that require the careful coding, rather than using separate assembly source files. We have coded this for both the Microsoft MASM inline assembler and for the GNU inline assembler. I have one issue that remains to be solved with the gnu inline assembly.
The functions that are hand coded with inline assembly are entirely inline-assember code - no C++ code in the functions what-so-ever. For the GNU inline assembler, they are coded as a single __asm__("" : : : ); statement. One of the critical features of the functions, and the reason they must be hand coded in assembler, is that they are cdecl-calling convention, so that the double values they return are returned in st(0) of the FPU and are thus returned as 80- bit values. As an (extremely simple) example, consider this code: typedef const unsigned char Double80[10]; Double80 s_oneOverRootTwoPi = { 0x68, 0x84, 0xB2, 0xA1, 0x9E, 0x29, 0x42, 0xCC, 0xFD, 0x3F}; // 0x3FFDCC42299EA1B28468, 0.39894228040143267793994605993438 #if defined(__GNUC__) #if 1 #define ASM_CALLING_CONVENTION cdecl #define __ASM_NAKED_RETURN \ "\n\t" "leave" \ "\n\t" "ret" #elif 0 #define ASM_CALLING_CONVENTION cdecl, always_inline #define __ASM_NAKED_RETURN #else #define ASM_CALLING_CONVENTION naked #define __ASM_NAKED_RETURN #endif inline double oneOverRootTwoPi() __attribute__(( ASM_CALLING_CONVENTION )); inline double oneOverRootTwoPi() { __asm__( "fldt %[s_oneOverRootTwoPi]" __ASM_NAKED_RETURN : : [s_oneOverRootTwoPi] "m" (*s_oneOverRootTwoPi) : "st(7)" ); } #endif //#if defined(__GNUC__) The intent of the function oneOverRootTwoPi() is to allow the use of the constant as an 80-bit constant within expressions, and one would like it to be inlined. However, I do not know how to tell the compiler that my __asm__ code is already setting up the return value in st(0). If I do not set __ASM_NAKED_RETURN as above (and I really shouldn't be doing that), the compiler insists on loading a default return value into st(0) (NAN I think) before returning, even though I have nowhere instructed it to do so. To avoid that, I put in my own return before the compiler-generated load-and-return code can execute. That approach of course precludes inlining the function. So how do I instruct the compiler NOT to load its own return value into st(0) before returning? How do I tell it that my __asm__ code has already loaded the return value into st(0)? I am guessing there is some output constraing I can add, but I just could not find anything. The above example is very very simple. We do have few substantial functions which are hand coded in assembler that suffer from the same issue. A bit more detail on the above, with specifics: This source: typedef const unsigned char Double80[10]; Double80 s_oneOverRootTwoPi = { 0x68, 0x84, 0xB2, 0xA1, 0x9E, 0x29, 0x42, 0xCC, 0xFD, 0x3F}; // 0x3FFDCC42299EA1B28468, 0.39894228040143267793994605993438 double oneOverRootTwoPi() __attribute__(( cdecl )); double oneOverRootTwoPi() { register double dReturnValue; __asm__( "fldt %[s_oneOverRootTwoPi]" : "=t" (dReturnValue) : [s_oneOverRootTwoPi] "m" (*s_oneOverRootTwoPi) : ); return dReturnValue; } generates the following atrocious code, which COMPLETELY kills the entire purpose of the function: Dump of assembler code for function _Z14oneOverRootTwov: 0x00401412 <+0>: push %ebp 0x00401413 <+1>: mov %esp,%ebp 0x00401415 <+3>: push %esi 0x00401416 <+4>: push %ebx => 0x00401417 <+5>: sub $0x10,%esp 0x0040141a <+8>: fldt 0x40407a 0x00401420 <+14>: fstpl -0x10(%ebp) 0x00401423 <+17>: mov -0x10(%ebp),%ebx 0x00401426 <+20>: mov -0xc(%ebp),%esi 0x00401429 <+23>: mov %ebx,%eax 0x0040142b <+25>: mov %esi,%edx 0x0040142d <+27>: mov %eax,-0x18(%ebp) 0x00401430 <+30>: mov %edx,-0x14(%ebp) 0x00401433 <+33>: fldl -0x18(%ebp) 0x00401436 <+36>: add $0x10,%esp 0x00401439 <+39>: pop %ebx 0x0040143a <+40>: pop %esi 0x0040143b <+41>: leave 0x0040143c <+42>: ret End of assembler dump. The following pedantically-ugly source: typedef const unsigned char Double80[10]; Double80 s_oneOverRootTwoPi = { 0x68, 0x84, 0xB2, 0xA1, 0x9E, 0x29, 0x42, 0xCC, 0xFD, 0x3F}; // 0x3FFDCC42299EA1B28468, 0.39894228040143267793994605993438 double oneOverRootTwoPi() __attribute__(( cdecl )); double oneOverRootTwoPi() { __asm__( "fldt %[s_oneOverRootTwoPi]" "\n\t" "leave" "\n\t" "ret" : : [s_oneOverRootTwoPi] "m" (*s_oneOverRootTwoPi) : ); } generates the following almost-ideal code, which never-the-less cannot be inlined: Dump of assembler code for function _Z14oneOverRootTwov: 0x004013e2 <+0>: push %ebp 0x004013e3 <+1>: mov %esp,%ebp => 0x004013e5 <+3>: fldt 0x40407a 0x004013eb <+9>: leave 0x004013ec <+10>: ret 0x004013ed <+11>: flds 0x40473c 0x004013f3 <+17>: leave 0x004013f4 <+18>: ret End of assembler dump. The problem is, WHY DOES THE GNU COMPILER INSIST ON GENERATING THE "flds 0x40473c" before the return code? Nowhere does the source say "return xxx;", and why does GCC take it upon itself to return a default value??? If I forget to (or in this case, intentionally do not ) return a value, why doesn't the compiler simply issue me a stern warning, but still do what I instruct it to do? Acceptable code would be: Dump of assembler code for function _Z14oneOverRootTwov: 0x004013e2 <+0>: push %ebp 0x004013e3 <+1>: mov %esp,%ebp 0x004013e5 <+3>: fldt 0x40407a 0x004013eb <+9>: leave 0x004013ec <+10>: ret End of assembler dump. Ideal code would be: Dump of assembler code for function _Z14oneOverRootTwov: 0x004013e5 <+3>: fldt 0x40407a 0x004013ec <+10>: ret End of assembler dump. so long as, in both cases, the COMPILER is what generates the return code, so that inlining would happen correctly. This is where the "naked" attribute would be handy. I understand developers reluctance to implement this, but what is my alternative? _______________________________________________ help-gplusplus mailing list help-gplusplus@gnu.org https://lists.gnu.org/mailman/listinfo/help-gplusplus