------- Comment #2 from jacob at math dot jussieu dot fr 2007-09-30 09:16 ------- Here are some thoughts about why it is so fast with g++-4.2, perhaps related to why it segfaults.
My library is an Expression Templates library. So when you do m1+m2 with matrices m1 and m2, instead of computing the sum of these two matrices, it constructs a new object of type (roughly) Sum<Matrix,Matrix> and passes to its contructor references to m1 and m2. So when you do m3=m1+m2 it (roughly) calls Matrix::operator=(Sum<....>) which calls Sum<...>::read() to evaluate the entries in the matrix sum. It is very important that the compiler be clever enough to understand that the objects of type Sum<...> are short-lived, so it doesn't need to emit any code for them in the final binary. g++ 4.1 didn't understand that, so it produced slow code. g++ 4.2 understands that, so it optimizes accordingly. That explains why 4.2 produces 4x faster code in my benchmarks. But I am afraid that I might be hitting a bug in this optimization. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33599