Hello, I am working on a larger project processing audio signals in double format. As I needed to estimate the max. Amplitude of a buffer, it returned a faulty value.
I debugged the assembler code and realized, that the optimizer created a faulty computation of the maximum. By only changing one instruction, the computation is correct. I've reduced the code to a minimum of code supplementing function calls that fill the buffer with a simple memcpy function. The code is standalone. I'll include the factor.ii which is also a command line program demonstrating the error. I've compiled it with the command line found below. I also tried out leaving the compiler flag -march=i686 away. It just changed my assembler code but didn't get rid of the error. Further down I'll include my analysis of the assembler code. g++ -o factor -Wall -v -O2 -march=i686 factor.cpp: Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --disable-libmudflap --enable-targets=all --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu Thread model: posix gcc version 4.2.3 (Debian 4.2.3-3) /usr/lib/gcc/i486-linux-gnu/4.2.3/cc1plus -E -quiet -v -D_GNU_SOURCE factor.cpp -march=i686 -Wall -O2 -fpch-preprocess -o factor.ii ignoring nonexistent directory "/usr/local/include/i486-linux-gnu" ignoring nonexistent directory "/usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../i486-linux-gnu/include" ignoring nonexistent directory "/usr/include/i486-linux-gnu" #include "..." search starts here: #include <...> search starts here: /usr/include/c++/4.2 /usr/include/c++/4.2/i486-linux-gnu /usr/include/c++/4.2/backward /usr/local/include /usr/lib/gcc/i486-linux-gnu/4.2.3/include /usr/include End of search list. /usr/lib/gcc/i486-linux-gnu/4.2.3/cc1plus -fpreprocessed factor.ii -quiet -dumpbase factor.cpp -march=i686 -auxbase factor -O2 -Wall -version -o factor.s GNU C++ version 4.2.3 (Debian 4.2.3-3) (i486-linux-gnu) compiled by GNU C version 4.2.3 (Debian 4.2.3-3). GGC heuristics: --param ggc-min-expand=45 --param ggc-min-heapsize=29241 Compiler executable checksum: f63294e1c8ecc1bf2473a5bae1642fbe as -V -Qy -o factor.o factor.s GNU assembler version 2.18.0 (i486-linux-gnu) using BFD version (GNU Binutils for Debian) 2.18.0.20080103 /usr/lib/gcc/i486-linux-gnu/4.2.3/collect2 --eh-frame-hdr -m elf_i386 --hash-style=both -dynamic-linker /lib/ld-linux.so.2 -o factor /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/crt1.o /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/crti.o /usr/lib/gcc/i486-linux-gnu/4.2.3/crtbegin.o -L/usr/lib/gcc/i486-linux-gnu/4.2.3 -L/usr/lib/gcc/i486-linux-gnu/4.2.3 -L/usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/i486-linux-gnu/4.2.3/../../.. factor.o -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /usr/lib/gcc/i486-linux-gnu/4.2.3/crtend.o /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/crtn.o The optimizer generates the error with the following line of code: for (unsigned long i = 0; i < insize; ++i) _factor = std::max(_factor, std::max(-data[i], data[i])); with double * data; (Audio Data) unsigned long insize; (number of doubles in the buffer) double _factor; (currently estimated, global factor) Generated Assembler code from factor.s: // Load data Buffer pointer to eax movl -36(%ebp), %eax // Clear edx (that's our i) xorl %edx, %edx // Load _factor on FP-Stack fldl (%esi) // Have _factor twice on the FP-Stack (Stack: (original)_factor, (current)_factor) fld %st(0) // Start the for-loop jmp .L45 .p2align 4,,7 .L64: // This at the beginning of every iteration i >= 1 // This is the actual consistency error // At this pointer we have the following on the FP-Stack: st(0): (current)_factor, st(1): (original)_factor // After executing this, we'll have st(0): (original)_factor, st(1): (current)_factor, which creates a problem further down // To show what is actually going wrong, I'll label _factor // with original for the original factor that was present when the // for-loop was started and current for the value, that is actually the // correct factor. // Note: To fix the bug, this instruction needs to be fstl %st(1) to have // the current factors in BOTH st(0) and st(1) fxch %st(1) .L45: // Load the Value from the current data Buffer Position onto the stack // FP-Stack: st(0): data[i], st(1): (orig)_factor, st(2): (curr)_factor fldl (%eax) // Negate // FP-Stack: st(0): -data[i], st(1): (orig)_factor, st(2): (curr)_factor fchs // Safe -data[i] to ??? - (why???) fstl -16(%ebp) // Load another copy of the current data Buffer value // FP-Stack: st(0): data[i], st(1): -data[i], st(2): (orig)_factor, st(3): (curr)_factor fldl (%eax) // This moves the value max(data[i],-data[i]) into st(0) // FP-Stack: st(0): max(data[i], -data[i]), st(1): -data[i], st(2): (orig)_factor, st(3): (curr)_factor fucomi %st(1), %st fcmovbe %st(1), %st // Move the estimated max to st(1) and remove st(0) // FP-Stack: st(0): max(data[i], -data[i]), st(1): (orig)_factor, st(2): (curr)_factor fstp %st(1) // Now compare the max(data[i], -data[i]) and (curr)_factor fucomi %st(2), %st //Overwrite st(2) with (curr)_factor and remove st(0) // FP-Stack: st(0): (orig)_factor, st(1): max(data[i], -data[i]) fstp %st(2) // Exchange st(0) and st(1) // FP-Stack: st(0): max(data[i], -data[i]), std(1): (orig)_factor fxch %st(1) // Now if (max(data[i], -data[i]) < (curr)_factor) then write (orig)_factor to st(0) // FP-Stack: st(0): (max(data[i], -data[i]) < (curr)_factor ? (orig)_factor : max(data[i], -data[i])) , std(1): (orig)_factor // If everything would be correct, st(0) would contain (curr)_factor now! fcmovbe %st(1), %st // Incriment our i addl $1, %edx // Incriment our data Buffer Pointer addl $8, %eax // Check if we need another iteration cmpl %ebx, %edx // If we do, jump jne .L64 // We are finished and have a faulty (if iteration is > 1) (curr)_factor and (orig)_factor on the stack. This overwrites the (orig) and just leaves (curr) on the stack fstp %st(1) // Write the Result to memory (variable _factor) fstpl (%esi) -- Summary: Optimizer generates faulty assembler code when estimating max. floating point value in a for loop Product: gcc Version: 4.2.3 Status: UNCONFIRMED Severity: major Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: tcm dot home at gmx dot de http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36953