On Wed, 11 Mar 2009, Tim Kroeger wrote:

> Still, I wonder why it cannot be replicated by writing the grid to a file and 
> re-reading it in.  Will that change the distribution of the dofs to the 
> processors?

Often, yes.  IIRC our partitioning results (at least with
METIS/ParMETIS, maybe not with space filling curves) depend on mesh
history, not just the current mesh state.

So yeah, that would explain why the problem isn't being replicated by
a .xdr restart.  We've got some parallel Mesh I/O now, and perhaps
loading a different format preserves partitioning by default?  I'd
have to ask Ben or Derek.

> All variables are first order Lagrange.  Hence, the systems differ only in 
> the system type and the number of variables:
>
> As said before, the crash is triggered by system 4.  I would suspect that it 
> only occurs on transient system

That has to be the case.

> -- although this surprises me since the other systems contain
> ghosted vectors as well, don't they?

They do, but they don't project those vectors.  solution gets
projected, current_local_solution just gets re-localized from the
result.

>> Well, there's the hard way:  Compile the crashing test case in devel
>> mode
>
> You mean debug mode, right?

I think devel mode would be enough - all you need is "-g" to make gdb
tolerable.  The big change in debug mode over devel isn't debugging
symbols in the binary, it's all the GNU STL safety testing that gets
turned on.

>> But let's try the either-easy-or-futile way first: I'll write a
>> (possibly redundant) patch to make sure we're properly getting
>> constraint dependency dofs into the send list, and you can try running
>> with that.  We can at least verify or rule out my first guess before
>> we need more information to come up with a second guess.
>
> I vote for this easy way and wait for your patch. (-:

The patch just got committed to SVN.  Unless Ben can point out some
subtle send_list machinations I've missed, the fact that it's even
theoretically possible to be missing a constraint dependency is a bug.
Let's just hope that it's the *only* bug and that we're not missing
something else as well.

> Nevertheless, I'll try out whether the crash occurs on 2 CPUs as well.

Let me know.  If this patch doesn't fix it, I'm out of easy ideas.
---
Roy

------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
Libmesh-devel mailing list
Libmesh-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-devel

Reply via email to