On Fri, 13 Mar 2009, Tim Kroeger wrote: > Could you please check whether this enables you the reproduce the crash? You > need to run the program on 8 processors (with ghosted enabled of course). I > used METHOD=devel, but I guess it will crash in the other modes as well.
Well, it crashes in the other modes, but in opt mode doing crash forensics is effectively impossible and in dbg mode the crash takes forever to reach. Curses to whomever is responsible for _GLIBC_DEBUG changing the algorithmic complexity of std::sort, and thanks again to John for reminding us about devel mode. gcc is lousy at debugging devel mode binaries, but it's much better than nothing. Anyway, the crash turned out to have a couple bugs behind it: 1. When I wrote enforce_constraints_exactly years ago, I apparently understood that each processor only had to set its local constrained dof values, and I wrote a comment to that effect... but apparently I never wrote *code* to that effect! So we were potentially trying to set constraint values we didn't own, which could depend on dof values we didn't know. An extra half-line of code fixed that. 2. When I wrote add_constraints_to_send_list days ago, I assumed it was being called after process_recursive_constraints(); I forgot that the latter had to be called late, after user constraints have had a chance to be applied. So we didn't always have all the necessary data when setting constraint values we did own. To fix that, I'm now doing these things (as well as sorting the send list) by hand in System::init_data() and EquationSystems::reinit() to ensure they happen in the right order. I really don't like the lack of modularity that implies, but I couldn't figure out how to do things in DofMap without nonintuitive API behavior or a redundant sort_send_list() operation. This looks like it may have been a pretty nasty bug. Under some conditions it caused ghosted vectors to crash, but I'd expect that under slightly rarer conditions it would cause serial vectors to calculate incorrect constrained dof values. This wouldn't effect my FEMSystem code (which only calls enforce_constraints_exactly on parallel vectors), so I'd have never noticed, but it could kill the accuracy of anyone whose code combined TransientSystem, parallel AMR, and bad luck!! Anyway, I've checked the fixes into SVN; now might be a good time for those of us on the bleeding edge to update. > By the way, I observed another very strange thing: If I change the values of > {x,y,z}{min,max} of the start grid (as in the comments of the test program), > it crashes already on the first refinement step and at a completely different > point, that is in elem.h, line 1744. (That's the assert in > Elem::compute_key() with four arguments.) That does not make any sense at > all to me. The error doesn't seem to make sense, but then neither does your function call. ;-) You got confused about parameter order, and passed in xmin=xmax=0.0 and zmin=zmax=1.0. libMesh gets completely confused when two distinct nodes overlap - you had *every* node overlapping many others. ;-) > Anyway, complete confusion is a good state to start vacations with, isn't it? Well, I hope "it's probably fixed now" is a good way to come back. --- Roy ------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ Libmesh-devel mailing list Libmesh-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmesh-devel