Yes, try to do the vector indexing yourself first to see if it's the operator calls that are throwing things off.
I did a bunch of work on this myself a few years ago... all I was attempting to speed up was just variable value evaluation... not Re/Ke evaluation as a whole. Let me see if I can dig up what I did..(I'll do some searching and send another email). I eventually dropped it because, while it gave some speedup in some extreme cases (like with thousands of variables to evaluate) it was marginal (or even slower) for the more common cases (1-10 variables). Honestly, 10% is not worth it to me. Any real application (that isn't example 3) is going to have WAY more going on that can't be vectorized anyway. If 10% is our best case then I don't think this extra complexity is worth it. Further, non-vectorizable work is typically perfectly parallel, which means I can just use 10% more cores and get the same effect now... which is easy to do. Hopefully a bit more work will yield larger gains. Derek On Tue, Jan 5, 2016 at 12:30 AM Roy Stogner <royst...@ices.utexas.edu> wrote: > > On Mon, 4 Jan 2016, Tim Adowski wrote: > > > However, all versions of GCC were unable to vectorize the Ke loop > > due to "bad data ref", and both Intel versions required "#pragma > > ivdep" in order to vectorize the Ke loop. > > One last thought: is it possible that what is confusing gcc isn't your > class, but rather the DenseMatrix class? Try replacing "Ke(i,j)" with > "my_vector[i*M+j]" or whatever and see if gcc can handle that? > --- > Roy > > > ------------------------------------------------------------------------------ > _______________________________________________ > Libmesh-devel mailing list > Libmesh-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/libmesh-devel >
------------------------------------------------------------------------------
_______________________________________________ Libmesh-devel mailing list Libmesh-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmesh-devel