On Thu, 25 Apr 2019, Derek Gaston wrote: > This is an email from 3 years ago... no one responded :-)
Man, and I was just starting to feel proud of myself for starting to catch up on *months*-old issues... > This is coming up again because we're looking at "array variables" > again... and this would be a large optimization. Are you sure? IIRC one of Paul's students experimented years ago with what I had *thought* would be the lowest-hanging fruit on vectorization, switching the order of the shape-function and quadrature-point indices in our FE element-local arrays, but then reported only minimal speedup on assembly: single-digit percentages, not double. > Any comments? Well, there's certainly no harm in trying. I'm done digging about in DofObject for the #2095 changes, and those were actually surprisingly orthogonal to the dof_number code anyway, so even if I suddenly hanker to finish #1438 it might not step on any toes. We currently have a ton of code that assumes dof_number is sorted first by owning processor_id, but other than that we're flexible (e.g. variable vs node sorting) and we should be able to become more flexible still without breaking anything. > I'm working on some low-level optimization stuff... and one of the > things I want to do is more vectorization when computing the value > of variables and when computing residuals, etc. I'm using the > variable groups stuff to be able to do large vector operations. To > that end... I think that the current choice for dof-ordering within > variable groups could be changed to be more amenable to > vectorization. > > Currently DofObject uses dof numbering based on this ordering for variable > groups: > > id = base + var_in_vg*ncomp + comp > > The problem with this is that I would like to do a vector operation that is > like this: > > phi_i * all_dofs_in_var_group_corresponding_to_i > > With any FE types that have more than one component the above ordering means > that the dofs corresponding to that shape function are spread > out in memory (i.e. they're NOT contiguous) and that would preclude > vectorization of the above operation. So you're doing operations directly on the DoFs, not on evaluations at quadrature points? > Instead, if we use a dof ordering like this: > > id = base + comp*n_var_in_vg + var_in_vg > > All of the dofs that need to multiply the same shape function would be > contiguous and easily vectorized. > > I don't think this change would effect anyone. We've never guaranteed this > ordering (and it's fairly new anyway)... I think everyone is > probably using the API instead of thinking of raw memory access like this > (And I know I probably should be too... but I've been doing it > that way for over 10 years and I have a few applications that have hundreds > to tens-of-thousands of variables now that could really use this > optimization). The most obvious catch here is that dof_number is so far into inner loops that my usual "make as much stuff runtime-selectable as possible" demand is completely overridden by performance concerns; this would have to be a configure-time option IMHO. If you want to do it yourself I don't see any objections; if you'd like me to take first crack then start up an issue and assign me so I don't forget about the idea again? --- Roy _______________________________________________ Libmesh-devel mailing list Libmesh-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmesh-devel