On Tue, 15 Apr 2008, Tim Kroeger wrote:

> Dear libMesh team,
>
> My application code crashes when running in parallel.  After
> backtracing the crash for about two days, I found that in dof_map.C,
> after line 175 (in the function DofMap::set_nonlocal_dof_objects()),

Ah, one of the functions I wrote...  Now I see why it's called "svn
blame".

> the value of request_to_fill[i] is different from that of
> requested->id().  As far as I understood from reading the code, this
> should not happen.

Not only should it not happen, but even if it did happen it should
have already tripped an assert!  That dofobject_accessor is just a
shim to MeshBase::node_ptr or MeshBase::elem.  With SerialMesh both
those methods include an assert (whatever->id() == i).  With
ParallelMesh both those methods assert (whatever == NULL ||
whatever->id() == i), then the DofMap asserts (whatever != NULL)
afterward.

> Question #1: Did I understand the code correct, i.e. am I right that
> this should not happen?

Absolutely.

> Question #2: Does one of you guys have an idea why this could happen
> (other than my application code doing stupid things)?

Not a clue.  Even if your application code tried to renumber ids
itself, this sort of crash could happen in opt mode but it should at
least hit a useful assert in devel mode.

> Question #3: Would you mind adding an
>
>       assert(request_to_fill[i]==requested->id());
>
> at this place, commit it to the repository, and check what that does
> to your applications?

I'd rather not commit it to the repository yet (since in theory it
should be redundant) but I'll try it out on the examples and on my own
apps just in case the theory is incorrect.  Have you tried this assert
in your own code yet?  Does it trigger?

> Question #4: Will I want to try to create a simple example that
> reproduces the problem, although this might take me quite a long time?
> (This question is rhetorical.)

Unfortunately so.  Parallel debugging is hard enough when you've got
the breaking app right in front of you; it's practically impossible by
proxy.
---
Roy

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Reply via email to