On Mon, 4 Jan 2016, Tim Adowski wrote:

> Using the START_LOG/STOP_LOG macros to time the QP loop, I got the following 
> results for the 1D shape function implementation:
> Original: 1028ms (average)
> Modified with Fe loop vectorized: 920ms (average)
> Difference: ~10%
> So there is a significant speedup to be gained by switching up the shape 
> function implementation, even with only one of the loops vectorized.

This is actually really impressive.  The Ke loop should be more
expensive than the Fe loop; if you're getting 10% speedup with Fe
alone vectorized then we should have tried this years ago...

> An interesting and highly problematic note is that there is an open bug in 
> GCC wherein auto-vectorization is not possible if the loop iteration
> variable is of type "unsigned int".
> Bug reports: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57206 and 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48052
> Referring to the Fe loop, I was only able to achieve vectorization in GCC 
> (all versions) by removing the "unsigned" from the iteration variable.

Hmm... in general, using unsigned int can prevent optimizations
because the compiler is required to generate correct code for
overflow/underflow cases; by contrast with signed int overflow and
underflow are undefined behavior so the compiler is allowed to assume
they will never happen and generate instructions accordingly.

I don't see how this could be a problem in your code, though - you're
taking only signed int arguments in your fe_shape_function class, and
even if those get generated from unsigned int loop counters via
conversion the compiler should still be able to assume thereafter that
overflow can't happen.

> Moving forward, it would seem that refactoring the shape function 
> implementation would yield cleaner code.
> For example, fe_base calls vector.resize() many times in order to fully 
> initialize the 2D vector. However, with the new class, these calls are
> unnecessary since for the 1D vector, vector.resize() only needs to be called 
> once from within the class.

Indeed.  We'll want to keep support for vector<vector> for backwards
compatibility, but we can create a shim subclass of vector which has a
"array_resize(N,M)" method or some such to remain compatible with
whatever the new class uses.

> The [][] operator could also be replaced with parentheses (i,j) or a
> get_val(i,j) function, since the code to implement the double
> brackets adds quite a bit of complexity to the class.

This we'll want to keep, I'm afraid.  Complexity under the hood is a
reasonable price to pay for keeping people's older apps working.
---
Roy

------------------------------------------------------------------------------
_______________________________________________
Libmesh-devel mailing list
Libmesh-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-devel

Reply via email to