As long as we're on this subject let me throw a few things out there....

I've recently spent a good amount of time optimizing our variable value /
gradient / second derivative evaluation code.  I've been trying to get as
much of it to vectorize as possible using the Intel compiler.  This is
important to us because we have a _lot_ of variables and a lot of
quadrature points (because we sometimes use high order FEs).  When I say a
lot... I just had one of our users come in and say that he is going to be
using 4000 variables soon.  Also, I just ran a simulation with 1,024
variables... with the whole simulation totaling over 100 Million DoFs.
 When you have this many variables, just computing their values and
gradients at the quadrature points becomes a significant burden...

What I've run into though is that there is no good way to get gradient
evaluation vectorized because we are working with vectors of Gradient
objects... meaning that the values themselves are not contiguous in memory.
 So when we're doing something like this:

  for (unsigned int i=0; i < num_dofs; i++)
  {
    soln_local = current_solution(dof_indices[i]);

    for (unsigned int qp=0; qp < nqp; qp++)
      grad_u_qp->add_scaled(grad_phi[i][qp], soln_local);
  }

that inner for loop can't be vectorized because those grad_phi[i][qp]
objects aren't contiguous memory.

I tried reading them out into a flat vector and then vectorizing that
operation... and it did work (the actual multiply by the solution only took
microseconds) but the time to read the grads out into a flat vector and
then put them back into Gradient objects outweighed the fact that the
normal code can't be vectorized (although, not by as much as you would
expect!).

If we're redesigning this stuff... is there any way we could come up with
an interface that would give us flat vectors (I would love it if I could
get back one long vector that contained all of the gradients evaluated at
each quadrature point for each shape function... this would make it much
more amenable to vectorization AND to GPU computation....)?  Vectorization
is becoming much more important these days.... the new Sandybridge
architecture from Intel and Interlagos from AMD can both double the number
of FLOPs they do per cycle compared to previous gen processors for
multiply-add operations (which is exactly what we're doing!) but only if
you're utilizing the vector ops...

Another thing:

We are now supporting multiply dimensioned pieces of mesh in the same mesh
(ie we have 1D elements and 3D elements in the same mesh... but not sharing
nodes (although they do transfer information back and forth)).  In order to
do that seamlessly we have to copy the shape function values in and out of
internal data structures.  It would be awesome if we could _pass in_ the
vectors to be filled by libMesh instead of having libMesh own the data and
pass us back references.  Then we could avoid a lot of copying by passing
the same vector to be filled into multiple FE objects (that are of
different dimension) and asking them to fill those vectors when we're on
the corresponding elements...

Just some junk to think about while we're redesigning some of this stuff.

Derek

On Thu, May 10, 2012 at 4:37 PM, Roy Stogner <royst...@ices.utexas.edu>wrote:

>
> Paul Bauman has an application that's motivating him to put
> vector-valued FE capabilities into libMesh, and I've got a
> shame-this-wasnt-done-years-ago that's motivating me to help, so we're
> hashing out design ideas now.
>
> Two design decisions that've come up, submitted for general commentary:
>
>
> 1.:  For vector-valued data, it would be good to return shape values as
> a vector-of-vector-of-Gradients instead of
> vector-of-vector-of-Numbers, and shape derivatives would be a
> vector-of-vector-of-Tensors.  (we'll probably postpone second
> derivative support and add a Rank3Tensor class or some such for it).
>
> All agreed?  The alternative (returning
> vector-of-vector-of-vector-of-Numbers, etc) would be much uglier for
> user code.
>
>
> The catch with this is that the current FEBase already stores
> vector-of-vector-of-Numbers.  And returns it, directly, via inline
> get_phi().
>
>
> So, 2.:  Do we create FEAbstract, which doesn't include *phi* data or
> get_*phi* methods, then derive FEBase and FEVectorBase from it?  Or do
> we just add get_vec_*phi* methods (and vec_*phi data) to FEBase?
>
> With the latter option, we'd have a bunch of redundant (albeit empty)
> vectors in every FEBase, and trying to access scalar data from a
> vector element type or vice-versa would be a runtime error (or
> possibly only an assert), not very type safe.
>
> With the former option, we'd end up having to turn get_*phi* from
> inline functions into virtual functions.  This is what we're leaning
> towards.  The cost of a virtual function is relatively trivial when
> amortized.  I benchmark a few percent overhead when compared to even
> dead simple bilinear-fe linear residual-type loops, a few thousandths
> of a percent overhead when compared to biquadratic slightly nonlinear
> jacobian-type loops.
>
> The cost of preventing crazy optimizations can be huge: in the inlined
> case icpc originally detected that my fakey benchmark couldn't be
> changing phi in-between get_phi calls, reordered my loops to make the
> element loop the inner loop, then vectorized and combined my equations
> to chop out 90% of the FLOPs... but even a slight added code
> complication (a do-nothing but non-const method called on the fake FE
> object) took things back to normal; I can't imagine anything similar
> affecting a real code where we call FE::reinit on every element.
> ---
> Roy
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Libmesh-devel mailing list
> Libmesh-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/libmesh-devel
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Libmesh-devel mailing list
Libmesh-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-devel

Reply via email to