Re: [Libmesh-users] Performance of EquationSystems::reinit() with ParallelMesh

Roy Stogner Thu, 04 Sep 2008 08:00:40 -0700

On Thu, 4 Sep 2008, Tim Kroeger wrote:

> On Wed, 3 Sep 2008, Roy Stogner wrote:
>
> What about a better documentation of this?  I have attached a patch for this.
>
>>> but I can't find a corresponding "serialize()" method.
>> 
>> There is no reverse method to produce a parallel vector from a global
>> one;
>
> I see, you are also calling serial vectors "global vectors" now.


Just one subset of serial vectors: those for which every coefficient
is valid.

> This way, the method localize() creates a global vector,

Not necessarily: localize() makes more data local to each processor,
which could produce what I've just started calling a "global vector"
or which could only fill in ghost node values, depending on whether or
not you give it a send list.

> which does not seem really intuitive to me.

Me neither.

>> Not if you're going to be modifying things unassisted, definitely.
>> But I think part of the problem with the class is the API's learning
>> curve, and it's probably easier for a newer user who isn't already
>> familiar with the quirks to see where that could be improved.
>
> That might be correct.  So if the only task for me is to discuss things with 
> you, then I'm in.

I'll copy you on any discussions we start up then, but again, there
probably won't be any NumericVector API work (not counting the recent
work on the Trilinos subclass) for a while.

>> That makes a lot of sense.  For that matter I should probably stop
>> referring to it as "SparseVector"; we could just use the existing
>> serial NumericVector API, with at most an additional send_list option
>> in the constructor.
>
> Does the vector have to decide on its constuction whether it is a parellel, 
> serial or sparse vector?

Currently the constructor makes the decision between serial and
parallel vector.  In the near term, I'd like to replace non-global
serial vectors with sparse vectors to get rid of the O(N) allocations
of zeros.  In the long term, I'd like to also replace parallel vectors
with sparse vectors to get rid of the solution/current_local_solution
redundancies.  What all that means for the API I haven't quite figured
out yet.

> By the way, if I add additional vectors to a system using add_vector(), I 
> assume they are stored parallel as well, right?

Let me see... I think System postpones the initialization, so that you
can init() such vectors manually (like TransientSsytem::init_data
does) to make them serial or parallel as you prefer.

>> These numbers include ~500 seconds of project_vector time out of ~3500
>> seconds total.  Not good, but not the 50% penalty I thought we were
>> seeing before.
>
> Strange enough, because still the program gives the impression of spending 
> much more than 50% of the time in EquationSystems::reinit().

Right, but EquationSystems::reinit() is more than just
project_vector().  All but a hundred seconds or so of DoFMap work is
definitely from ES::reinit() related activity, for example.  It's just
that project_vector() was the only part of that I knew still had some
scalability problems.

Hmm... except that on a SerialMesh, most of what the DofMap does is
forced to be O(N) to keep the mesh consistent; it's only on
ParallelMesh that DofMap behavior should be O(N/Nproc).  In that case
while there's surely still efficiency improvements to be made, you may
be stuck until I get a chance to finish the adaptive ParallelMesh
debugging at some unknown future date.  Sorry about that.

> The question hence is: What else does EquationSystems::reinit() do,
> other than System::project_vector()?

DoF renumbering, constraint calculations, sparse matrix
reconstruction, off the top of my head.

> I notice that DofMap::reinit() takes a large amount of time; does that mathod 
> scale well?

Probably not; DofMap operations are supposed to be O(N/Nproc) on a
ParallelMesh now, but it's stuck using O(N) communications to keep a
SerialMesh consistent, I believe.  On a SerialMesh this may actually
be slower than the "everybody do redundant work on the entire mesh"
code we used before, depending on your interconnect speed.

> Perhaps, a lot of time is just used by waiting of one processor for another 
> one to finish some task?

That can happen, and it would explain some of the differing results on
different processors' logs.

> In particular since Parallel::min() takes much less time per call.

What may be confusing here is that both min and max are templated to
work on scalars or vectors.  Perhaps min is only being called on
scalars in your code but max is getting called on vectors.  Perhaps we
need to use different logging names for the raw vector and the
vector<bool> implementations.

>> But back in project_vector: The localizations are taking ~110 seconds
>> total, but ~150 seconds are spent in projection computations that
>> should be scaling perfectly with Nproc.  ~240 seconds are spent in
>> enforce_constraints_exactly, but it looks like there's some
>> inefficient localization there that I'd forgotten about; I'll fix that
>> too.
>
> If you weren't so many kilometers away, I would invite you to join me 
> watching my logfile to see that the code appears to be inside 
> EquationSystems::reinit() nearly all the time.

I believe it; depending on where the Parallel::, FE::, and All runtime 
is coming from from most often, that log suggests anywhere from 52-93%
of your run time isn't coming from your assembly or solve.

We ought to make "count every second only on the top of the stack's
perflog" vs. "count every second on the whole stack's perflog" into a
user-controlled option... both are useful for optimizing, and the
latter info would be helpful here.
---
Roy

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Re: [Libmesh-users] Performance of EquationSystems::reinit() with ParallelMesh

Reply via email to