On Mon, 4 Jan 2016, Tim Adowski wrote: > Using the START_LOG/STOP_LOG macros to time the QP loop, I got the following > results for the 1D shape function implementation: > Original: 1028ms (average) > Modified with Fe loop vectorized: 920ms (average) > Difference: ~10% > So there is a significant speedup to be gained by switching up the shape > function implementation, even with only one of the loops vectorized.
This is actually really impressive. The Ke loop should be more expensive than the Fe loop; if you're getting 10% speedup with Fe alone vectorized then we should have tried this years ago... > An interesting and highly problematic note is that there is an open bug in > GCC wherein auto-vectorization is not possible if the loop iteration > variable is of type "unsigned int". > Bug reports: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57206 and > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48052 > Referring to the Fe loop, I was only able to achieve vectorization in GCC > (all versions) by removing the "unsigned" from the iteration variable. Hmm... in general, using unsigned int can prevent optimizations because the compiler is required to generate correct code for overflow/underflow cases; by contrast with signed int overflow and underflow are undefined behavior so the compiler is allowed to assume they will never happen and generate instructions accordingly. I don't see how this could be a problem in your code, though - you're taking only signed int arguments in your fe_shape_function class, and even if those get generated from unsigned int loop counters via conversion the compiler should still be able to assume thereafter that overflow can't happen. > Moving forward, it would seem that refactoring the shape function > implementation would yield cleaner code. > For example, fe_base calls vector.resize() many times in order to fully > initialize the 2D vector. However, with the new class, these calls are > unnecessary since for the 1D vector, vector.resize() only needs to be called > once from within the class. Indeed. We'll want to keep support for vector<vector> for backwards compatibility, but we can create a shim subclass of vector which has a "array_resize(N,M)" method or some such to remain compatible with whatever the new class uses. > The [][] operator could also be replaced with parentheses (i,j) or a > get_val(i,j) function, since the code to implement the double > brackets adds quite a bit of complexity to the class. This we'll want to keep, I'm afraid. Complexity under the hood is a reasonable price to pay for keeping people's older apps working. --- Roy ------------------------------------------------------------------------------ _______________________________________________ Libmesh-devel mailing list Libmesh-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmesh-devel