On Thu, 25 Apr 2019, Derek Gaston wrote:

> This is an email from 3 years ago... no one responded :-)

Man, and I was just starting to feel proud of myself for starting to
catch up on *months*-old issues...

> This is coming up again because we're looking at "array variables"
> again... and this would be a large optimization.

Are you sure?  IIRC one of Paul's students experimented years ago with
what I had *thought* would be the lowest-hanging fruit on
vectorization, switching the order of the shape-function and
quadrature-point indices in our FE element-local arrays, but then
reported only minimal speedup on assembly: single-digit percentages,
not double.

> Any comments?

Well, there's certainly no harm in trying.  I'm done digging about in
DofObject for the #2095 changes, and those were actually surprisingly
orthogonal to the dof_number code anyway, so even if I suddenly hanker
to finish #1438 it might not step on any toes.

We currently have a ton of code that assumes dof_number is sorted
first by owning processor_id, but other than that we're flexible (e.g.
variable vs node sorting) and we should be able to become more
flexible still without breaking anything.

> I'm working on some low-level optimization stuff... and one of the
> things I want to do is more vectorization when computing the value
> of variables and when computing residuals, etc.  I'm using the
> variable groups stuff to be able to do large vector operations.  To
> that end... I think that the current choice for dof-ordering within
> variable groups could be changed to be more amenable to
> vectorization.
> 
> Currently DofObject uses dof numbering based on this ordering for variable 
> groups:
> 
> id = base + var_in_vg*ncomp + comp
> 
> The problem with this is that I would like to do a vector operation that is 
> like this:
> 
> phi_i * all_dofs_in_var_group_corresponding_to_i
> 
> With any FE types that have more than one component the above ordering means 
> that the dofs corresponding to that shape function are spread
> out in memory (i.e. they're NOT contiguous) and that would preclude 
> vectorization of the above operation.

So you're doing operations directly on the DoFs, not on evaluations at
quadrature points?

> Instead, if we use a dof ordering like this:
> 
> id = base + comp*n_var_in_vg + var_in_vg
> 
> All of the dofs that need to multiply the same shape function would be 
> contiguous and easily vectorized.
> 
> I don't think this change would effect anyone.  We've never guaranteed this 
> ordering (and it's fairly new anyway)... I think everyone is
> probably using the API instead of thinking of raw memory access like this 
> (And I know I probably should be too... but I've been doing it
> that way for over 10 years and I have a few applications that have hundreds 
> to tens-of-thousands of variables now that could really use this
> optimization).

The most obvious catch here is that dof_number is so far into inner
loops that my usual "make as much stuff runtime-selectable as
possible" demand is completely overridden by performance concerns;
this would have to be a configure-time option IMHO.

If you want to do it yourself I don't see any objections; if you'd
like me to take first crack then start up an issue and assign me so I
don't forget about the idea again?
---
Roy
_______________________________________________
Libmesh-devel mailing list
Libmesh-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-devel

Reply via email to