Re: [Libmesh-users] Performance of EquationSystems::reinit() with ParallelMesh

Roy Stogner Wed, 03 Sep 2008 21:22:48 -0700

On Tue, 2 Sep 2008, Tim Kroeger wrote:

> Okay, I understand.  I didn't think about the mesh to occupy so much memory, 
> in particular because I have a lot of systems on that mesh.


Unfortunately it does.  I don't know about your application, but keep
in mind that more systems results in larger degree of freedom objects,
which are allocated by the Mesh class and only scale with a
ParallelMesh.

> I always wondered 
> whether project_vector() will serialize all of them at the same time or 
> consecutively.

The variables in a single vector for a System are all serialized
simultaneously; the different vectors attached to a System and the
different Systems in an EquationSystems are all serialized
consecutively.

> Of course, I counldn't find any reason to serialize them at 
> the same time,

Performance, especially if they're using the same finite element
types.  We could probably save a few CPU cycles (at the cost of more
memory) by only walking through the mesh once, and only constructing
and factoring each different edge/face/volume projection matrix once
in the case of non-Lagrange elements.

But that's nontrivial work to get an uncertain speed improvement at
the cost of an uncertain memory allocation, so I'm not motivated to do
it.

>> working on the easiest improvement to that right now: we'll keep
>> creating the serial vector, but localize to it with a properly built
>> send_list instead of doing a global localization.  This is the same
>> thing we do with the other serial vectors, and it should be
>> sufficient.
>
> Okay, sounds great.  Let me know when you finished that.

Will do.  There will be a slight loss of (hopefully unused)
functionality:

Currently project_vector() assumes that, if it's handed a serial
vector, that vector might have valid coefficients everywhere.  In
order to get rid of the O(N) communications in transient systems, I'd
like to start assuming that the only coefficients that need to be
valid in the output vector are the ones for local and ghost degrees of
freedom.  This will be fine for the library (whose only projected
serial vectors are the localized old solutions in TransientSystem or
FEMSystem with a transient solver attached) but any code which
maintains truly global vectors would now have to explicitly
resynchronize them after an EquationSystems::reinit().

I'm just going to presume that anyone doing anything that low level is
also reading libmesh_users assiduously; other than a change in the
system.h / doxygen documentation, this will be the only warning.  ;-)

> As far as I understand, currently the NumericVector interface class (with its 
> implementations such as PetscVector) is used as a parallel as well as a 
> serial vector.

Yes.

> I notice that there is a method localize() that seems to transform a
> serial into a parallel vector,

The reverse - localize() takes a parallel vector, and with no
send_list it gives you a global vector, or with a send_list it gives
you a global vector which only has valid coefficients on the listed
indices.

> but I can't find a corresponding "serialize()" method.

There is no reverse method to produce a parallel vector from a global
one; I think the only place that's done in the library is in
System::reinit, so perhaps nobody ever bothered to factor it out.

Since NumericVector::set() is virtual, we probably ought to move that
loop into a "parallel_update()" to turn one virtual function call per
dof per update into just one per update.

> Perhaps I have not yet understood the concept correctly (which seems
> not to be a good qualification to perform some modifications to that
> class).

Not if you're going to be modifying things unassisted, definitely.
But I think part of the problem with the class is the API's learning
curve, and it's probably easier for a newer user who isn't already
familiar with the quirks to see where that could be improved.

I found out today that there are a couple other PECOS project folks
who are interested in the idea of an abstract NumericVector interface
to both PETSc and Trilinos; I'll try to talk them into borrowing
libMesh's.  I don't think it's an urgent priority for us now, but
there's at least a chance of getting additional motivated help with it
this year.

> Concerning the compatibility of a "SparseVector" class with the other vector 
> formats: Wouldn't it be easiest to let "SparseVector" default to a serial 
> NumericVector for all vector formats other than Petsc at first?  (I would 
> assume that most users are using PETSc.)

That makes a lot of sense.  For that matter I should probably stop
referring to it as "SparseVector"; we could just use the existing
serial NumericVector API, with at most an additional send_list option
in the constructor.

>> In particular, try wrapping a log around lines 79 through 107 of 
>> system_projection.C and make sure that the scalability failure is there.
>
> Okay, I did this now.

Thanks.  This is interesting.  My thoughts:

These numbers include ~500 seconds of project_vector time out of ~3500
seconds total.  Not good, but not the 50% penalty I thought we were
seeing before.  We used to double-count some time expenditures in
PerfLog results, but I thought Ben's new stack-based PerfLog features
were supposed to eliminate that, and the disappearance of
non-fine-grained project_vector costs in your results seems to support
my belief.

There's some other surprising expenses here.  Parallel::max() taking
.6 seconds per call, 400 seconds total!?  That seems pretty messed up
even for a slow interconnect.  The different processors also report
some very varied results here.  I'm not sure how to account for what
I'm seeing.

But back in project_vector: The localizations are taking ~110 seconds
total, but ~150 seconds are spent in projection computations that
should be scaling perfectly with Nproc.  ~240 seconds are spent in
enforce_constraints_exactly, but it looks like there's some
inefficient localization there that I'd forgotten about; I'll fix that
too.

> By the way: I found a funny typo in libmesh_logging.h, see attached patch.

That's a silly one; thanks!  The fix is committed now.
---
Roy

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Re: [Libmesh-users] Performance of EquationSystems::reinit() with ParallelMesh

Reply via email to