Hi, C++ Lovers! I am using the boost::array template class trying to generalize my handcrafted vector specialization for the dimensions 2, 3 and 4.
As performance is of great importance to me I have written an initial benchmarker that tests how well g++ can unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. The gcc switch "-funroll-loops" should do just that. The test program calculates the dotproduct of two four-dimensional arrays of int 10 million times and looks like follows: The calculation is performed with a general and a specialized version of the dot product: general_dot() and special_dot() respectively. #include <boost/array.hpp> #include "../Timer.hpp" template <typename T, std::size_t N> inline T general_dot(const boost::array<T, N> & a, const boost::array<T, N> & b) { T c = 0; for (size_t i = 0; i < N; i++) { c += a[i] * b[i]; } return c; } template <typename T> inline T special_dot(const boost::array<T, 4> & a, const boost::array<T, 4> & b) { return (a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3]); } template <typename T, std::size_t N> std::ostream & operator << (std::ostream & os, const boost::array<T, N> & a) { os << '['; for (size_t i = 0; i < N; i++) { os << ' ' << a[i]; } os << ']'; return os; } typedef int S; //*< Scalar Type. int main(int argc, char * argv[]) { typedef boost::array<S, 4> T; T a; a.assign(3); T b = a; Timer t; const unsigned int nloops = 10000000; S sum = 0; t.reset(); for (unsigned int i = 0; i < nloops; i++) { sum += general_dot(a, b); } t.read(); std::cout << "general: " << t << std::endl; S tum = 0; t.reset(); for (unsigned int i = 0; i < nloops; i++) { tum += special_dot(a, b); } t.read(); std::cout << "special: " << t << std::endl; if (sum == tum) { std::cout << "Checksums are equal. OK" << std::endl; } else { std::cout << "Checksums are not equal. NOT OK" << std::endl; } return 0; } Compiling with g++-3.3.6 using the switches "-O3 -funroll-all-loops" and running this on my Pentium 4 yields the following benchmark: general: 60.965ms special: 902us Checksums are equal. OK As we can see the performance of the general_dot() is terrible (~60 times slower) compared to the special_dot(). Is g++-3.3.6 really that bad at optimizing or have I forgotten something? Do I have to switch to gcc version 4.0, 4.1 or 4.2 to make g++ compile the instantiation of general_code() to a code having similar/equal performance compared to the one produced by special_code()? Many thanks in advance, Per Nordlöw Swedish Defence Research Agency Linköping Sweden _______________________________________________ help-gplusplus mailing list help-gplusplus@gnu.org http://lists.gnu.org/mailman/listinfo/help-gplusplus