On Fri, 13 Mar 2009, Tim Kroeger wrote:

> Could you please check whether this enables you the reproduce the crash?  You 
> need to run the program on 8 processors (with ghosted enabled of course).  I 
> used METHOD=devel, but I guess it will crash in the other modes as well.

Well, it crashes in the other modes, but in opt mode doing crash
forensics is effectively impossible and in dbg mode the crash takes
forever to reach.  Curses to whomever is responsible for _GLIBC_DEBUG
changing the algorithmic complexity of std::sort, and thanks again to
John for reminding us about devel mode.  gcc is lousy at debugging
devel mode binaries, but it's much better than nothing.


Anyway, the crash turned out to have a couple bugs behind it:

1.  When I wrote enforce_constraints_exactly years ago, I apparently
understood that each processor only had to set its local constrained
dof values, and I wrote a comment to that effect... but apparently I
never wrote *code* to that effect!  So we were potentially trying to
set constraint values we didn't own, which could depend on dof values
we didn't know.  An extra half-line of code fixed that.

2.  When I wrote add_constraints_to_send_list days ago, I assumed it
was being called after process_recursive_constraints(); I forgot that
the latter had to be called late, after user constraints have had a
chance to be applied.  So we didn't always have all the necessary data
when setting constraint values we did own.  To fix that, I'm now doing
these things (as well as sorting the send list) by hand in
System::init_data() and EquationSystems::reinit() to ensure they
happen in the right order.  I really don't like the lack of modularity
that implies, but I couldn't figure out how to do things in DofMap
without nonintuitive API behavior or a redundant sort_send_list()
operation.


This looks like it may have been a pretty nasty bug.  Under some
conditions it caused ghosted vectors to crash, but I'd expect that
under slightly rarer conditions it would cause serial vectors to
calculate incorrect constrained dof values.  This wouldn't effect my
FEMSystem code (which only calls enforce_constraints_exactly on
parallel vectors), so I'd have never noticed, but it could kill the
accuracy of anyone whose code combined TransientSystem, parallel AMR,
and bad luck!!


Anyway, I've checked the fixes into SVN; now might be a good time for
those of us on the bleeding edge to update.


> By the way, I observed another very strange thing: If I change the values of 
> {x,y,z}{min,max} of the start grid (as in the comments of the test program), 
> it crashes already on the first refinement step and at a completely different 
> point, that is in elem.h, line 1744. (That's the assert in 
> Elem::compute_key() with four arguments.)  That does not make any sense at 
> all to me.

The error doesn't seem to make sense, but then neither does your
function call.  ;-)  You got confused about parameter order, and
passed in xmin=xmax=0.0 and zmin=zmax=1.0.  libMesh gets completely
confused when two distinct nodes overlap - you had *every* node
overlapping many others.  ;-)

> Anyway, complete confusion is a good state to start vacations with, isn't it?

Well, I hope "it's probably fixed now" is a good way to come back.
---
Roy

------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
Libmesh-devel mailing list
Libmesh-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-devel

Reply via email to