Hi
As you all rememeber the LagrangeInterpolator class can already do
interpolation in parallel on non-matching meshes for Lagrange elements. I had
another look at what it would take to get interpolation working in parallel for
any element and it turns out I do not need to change all that much. I have
implemented a first solution in the branch
https://bitbucket.org/fenics-project/dolfin/branch/mikael/interpolation-parallel
<https://bitbucket.org/fenics-project/dolfin/branch/mikael/interpolation-parallel>
just to test in a function called “LagrangeInterpolator::interpolateall". This
is what the algorithm looks like to get interpolation in parallel working for
any element
Parallel interpolation from Function u0 to Function u living on different
meshes with different partitioning:
1) Create a set of all different points of u that will require an eval (like
tabulate_all_coordinates but with a set of coordinates)
2) Create global bounding boxes for the partitioned mesh of u0 and distribute
to all processors.
3) Using bounding boxes, compute the processes that *may* own the points in
need of eval.
4) Distribute interpolation points to potential owners who subsequently tries
to evaluate u0. If successful, return values (result of eval) of u0 to owner.
5) Now all the results of u0.eval will be on the processes that needs it. It
remains to run over all cells locally (for u) and call evaluate_dofs using
these values. This is a bit tricky since evaluate_dofs needs to take a
ufc::function as argument. To this end I have a solution where I wrap all
results of u0.eval in an Expression, calling a map from point x to result
// Create map from coordinate to global result of eval
static std::map<std::vector<double>, std::vector<double>, lt_coordinate>
coords_to_values(lt_coordinate(1.0e-12));
…. Fill map coords_to_values using MPI ++
// Wrap coords_to_values in an Expression
class WrapperFunction : public Expression
{
public:
WrapperFunction(int value_shape) : Expression(value_shape) {};
mutable std::vector<double> _xx;
void eval(Array<double>& values, const Array<double>& x) const
{
for (uint j = 0; j < x.size(); j++)
_xx[j] = x[j];
const std::vector<double>& v = coords_to_values[_xx]; // <— Map from x to
u0.eval result
for (std::size_t j = 0; j < v.size(); j++)
values[j] = v[j]; // Put u0.eval in
values of Expression
}
};
6) Run over local mesh calling evaluate_dofs with wrapped function as the ufc
function.
7) Update coefficients of local vector with results from evaluate_dofs
Finished:-)
I have tested it for Nedelec elements of order 1 and bubble elements. Higher
order Nedelec elements do not have a tabulate_coordinates function implemented,
which makes it a bit more difficult to create the list of all interpolation
points. I think this can be solved quite easily, though, with running over the
local mesh once and collecting all x’s used in the evals.
I do not have any strong opinion on ufc::cell vs dolfin::cell or collective vs
non-collective eval, but I certainly think both these should be available to
the user. The collective eval is not used in the above algorithm, because it is
more efficient to collect all interpolation points in one large structure and
distribute, than it is to do each point for itself.
The parallel interpolation routine should probably be put in the Function class
and not the specific LagrangeInterpolator class, but for now I’ve just been
testing.
M
> On 04 Sep 2015, at 11:02, Chris Richardson <[email protected]> wrote:
>
> On 03/09/2015 15:54, Martin Sandve Alnæs wrote:
>> On 3 September 2015 at 16:50, Garth N. Wells <[email protected]> wrote:
>>> On 3 September 2015 at 14:42, Chris Richardson <[email protected]>
>>> wrote:
>>> On 03/09/2015 14:10, Martin Sandve Alnæs wrote:
>>> Part of the complexity in this chain steps from the numerous dolfin
>>> signatures. As discussed before, these also need cleanup to clarify
>>> collective/non-collective operations. While at it, we could also
>>> vectorize on x by allowing a number of coordinates wherever there's
>>> an
>>> x argument in the code below.
>>> For a moment I'd like to ignore the ufc/ffc parts and figure out
>>> how
>>> many eval signatures we really need in dolfin? In particular to
>>> resolve the collective/non-collective part. Then we can design the
>>> callback mechanism to match the ideal dolfin design.
>>> Here's one take (dodging the type of 'cell'):
>>> Is it as simple as just having a "collective" and "non_collective"
>>> eval?
>>> Typically, processes will iterate over some set of "local" points,
>>> asking for evaluations ( - OK, they may be "local" on one mesh, but
>>> not another).
>>> When the point is not "on process", the process needs to get it,
>>> somehow.
>>> But all the other processes are unaware - they are thinking about
>>> different points.
>>> It is quite unusual for all processes to want to evaluate the same
>>> point at the same time, for a simple collective operation.
>>> My thought was that we would have to store up a list of points, and
>>> then do a collective operation to resolve all the cross-process
>>> evaluations.
>>> Or maybe I have missed something...
>> My impression is that Martin is just suggesting the we split evals
>> into non-collective and collective 'groups', and have as few versions
>> of eval in each group. With fewer evals, we have few code paths to
>> think about.
>> The interface for collective evals is orthogonal to this, and
>> something we need to look closely at.
>> Garth
>> Exactly. The collective variants of eval will always end up calling
>> non-collective variants of eval but never the other way around.
>> Also the collective eval variants won't be virtual functions.
>> The evaluation of a GenericFunction on cells that are known to be
>> on the local process differs between Function and Expression, but
>> can be handled with a single signature "virtual void eval(v,x,cell)
>> const = 0;"
>> in GenericFunction, implemented by Function and by the users subclass
>> of Expression. The only thing we need to clean up there in the
>> ufc::function
>> is the ufc::cell vs dolfin::cell.
>> For a collective eval over multiple points, I currently see only two
>> variants:
>> a) each process pass a different set of point(s) and wants the
>> corresponding results
>> b) each process actually pass the same point(s) and wants all results
>> In both cases there is first communication of 'who owns which points',
>> then each process makes calls to the non-collective evals for its own
>> points,
>> then the results are communicated back to the right place.
>> I believe a) is what Chris was talking about as the most common
>> operation.
>
> OK, makes sense. I suppose I am thinking ahead to an implementation of
> 'interpolate',
> where fitting the collective calls in might be tricky.
>
> Chris
>
> --
> Chris Richardson
> BP Institute
> Madingley Road
> Cambridge CB3 0EZ
> _______________________________________________
> fenics mailing list
> [email protected] <mailto:[email protected]>
> http://fenicsproject.org/mailman/listinfo/fenics
> <http://fenicsproject.org/mailman/listinfo/fenics>
_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics