Re: [Libmesh-devel] SparsityPattern::Build::parallel_sync Over-estimating Nonzeros

Dmitry Karpeyev Wed, 26 Nov 2014 21:58:32 -0800

I see. It is very surprising that parallel_sync() would affect
rank-interior nodes. A bug?


On Wed, Nov 26, 2014, 23:34 Derek Gaston <fried...@gmail.com> wrote:

> Nope - it doesn't appear to be at the processor boundary only... but I
> would have to study it more.
>
> I really think that with a more careful algorithm we could get a much
> better estimate (if not the right answer).
>
> Derek
>
> On Thu, Nov 27, 2014 at 12:19 AM, Dmitry Karpeyev <dkarp...@gmail.com>
> wrote:
>
>>
>> Presumably, this overestimation happens only at the "boundary" nodes i
>> that are contained in elements living on other MPI ranks? Those foreign
>> ranks will count couplings (edges)  i-j that are shared by their elements
>> with the elements on rank p that owns i. Since only edge counts are
>> communicated back to p, there is no way to eliminate these duplicates.
>> Could we build a full sparsity pattern for such nodes _only_ ? That way the
>> memory issues can be controlled, yet the duplicates would be eliminated.
>> You would, however, need to communicate the edges, rather than their counts.
>>
>> Dmitry.
>>
>> On Wed, Nov 26, 2014, 22:34 Derek Gaston <fried...@gmail.com> wrote:
>>
>> Ben Spencer (copied on this email) pointed me to a problem he was having
>> today with some of our sparsity pattern augmentation stuff.  It was causing
>> PETSc to error out saying that the number of nonzeros on a processor was
>> more than the number of entries on a row for that processor.  The weirdness
>> is that this didn't happen if he just ran on one processor...
>>
>> Thinking that the problem was in our code (I believed we might have been
>> double counting somewhere) I started tracing this problem tonight... and
>> what I found is that libMesh is grossly overestimating the number of
>> nonzeros per row when running in parallel.  And since our code is set up to
>> believe that libMesh is producing the "perfect" number of nonzeros per row
>> we are blindly adding to an already inflated number that pushes us past the
>> size of the row...
>>
>> Here's what's happening when running on 8 processors (for this one DoF
>> that I'm tracing... which is #168)
>>
>> 1.  DofMap::operator() is computing the correct number of nonzeros (in
>> this case 60).
>>
>> I am taking the 60 number as being the correct number because that's the
>> final number for this DoF when it's run in serial (I haven't actually dug
>> in to see which DoF this is and manually compute the sparsity pattern...
>> yet).
>>
>> Judging by this it seems that all of the dofs connected to #168 must be
>> local (again, not completely verified... but the fact that
>> DofMap::operator() comes up with the same number as the final number in
>> serial is a good indicator).
>>
>> 2.  SparsityPattern::Build::parallel_sync() totally screws up.
>>
>> Putting print statements around line 3076 in dof_map.C I can see that
>> n_nz for #168 goes up to 117! Even worse... n_oz ALSO goes up to 117!
>>
>>
>>
>> Remember: the correct number for n_nz + n_oz should be _60_.  So we are
>> basically going to tell PETSc to set aside 4x as much memory for that row
>> as is necessary.
>>
>> 3.  n_nz and n_oz get chopped down around line 3076 in dof_map.C...
>>
>> n_nz gets chopped so that it's the min(n_nz, n_dofs_on_proc).  In my case
>> on that processor n_dofs_on_proc is 108 so n_nz gets set to that
>>
>> n_oz gets chopped so that it's min(n_oz, dofs_not_on_proc) you can see
>> that that could be _very_ bad!
>>
>> 4.  Now the (overestimated) n_nz and n_oz get passed to MOOSE for
>> modification and we start adding to n_nz/n_oz for dof couplings that
>> libMesh definitely didn't know about... but since n_nz is sitting at the
>> max possible already we blow past the number of dofs on this proc and then
>> PETSc errors (like it should).
>>
>>
>>
>> So... my question is this: is this really the best "estimate" we can do
>> in this case?
>>
>> This is a tiny problem in 3D with only 3 variables.  This will be MUCH
>> worse if you have, say, 2000 variables... you could be telling PETSc to
>> allocate ENORMOUS chunks of memory that are unnecessary.  I know that PETSc
>> could throw a bunch of that memory away after the first filling... but we
>> don't allow that in MOOSE because often we are pre-allocating for future
>> connections.  But even if you were to let it do that it means that there
>> could be a HUGE memory spike in the beginning until PETSc frees up a bunch
>> of memory.
>>
>> It seems like this code is currently a worst-case estimate of what could
>> happen.  It _does_ look like it might be better if we built the full
>> sparsity pattern... but that has it's own memory problems.
>>
>>
>>
>> Also... it looks like there is a lot more parallel communication than
>> necessary going on here.  We're sending large vectors of information from
>> proc to proc... even in the case where we're not building a full sparsity
>> pattern.  It seems like each processor could just send a minimal of "hey, I
>> have this many dofs that are connected to these rows you own"... ie one
>> scalar instead of a bunch of entries.
>>
>> So... should I take a stab at redoing some of this code?  I think that
>> it's possible to get a much better estimate and do so with much less
>> parallel communication.  I probably wouldn't mess with the code that does
>> the full sparsity pattern... I would just remove the "non" full sparsity
>> pattern code and make a different function that gets called if you're not
>> building a full sparsity pattern.  That probably should be done either way
>> (look at the huge "if" with duplicated code for each case in
>> DofMap::operator() ).
>>
>> Or do one of you guys see a quick fix that does something better?
>>
>> (Oh - BTW, I'm going to implement the same use of min() in MOOSE's
>> sparsity pattern augmentation stuff to get us through for right now - so
>> this isn't necessarily time sensitive)
>>
>> Derek
>>
>>
>> ------------------------------------------------------------------------------
>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>> Get technology previously reserved for billion-dollar corporations, FREE
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Libmesh-devel mailing list
>> Libmesh-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/libmesh-devel
>>
>>
>>
>

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk

_______________________________________________
Libmesh-devel mailing list
Libmesh-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-devel

Re: [Libmesh-devel] SparsityPattern::Build::parallel_sync Over-estimating Nonzeros

Reply via email to