On Wed, 16 Apr 2008, Tim Kroeger wrote:
> Ah, there we are. I always forgot that asserts do not hit in optimized mode. > (Running in debug mode requires recompiling both libMesh and our institute > software, since they share the same macros, so I avoid it whenever possible.) Understandable; libMesh takes quite a while to compile, I'm told. I do library development in a lab with distcc on all the computers and "make -j 30" in my shell scripts, so I'm afraid I usually don't notice and I've got little incentive to improve things. ;-) >> Unfortunately so. Parallel debugging is hard enough when you've got >> the breaking app right in front of you; it's practically impossible by >> proxy. > > You're right. And I must admit that I found out that it was my fault. That's reassuring; thanks for letting us know. The twin terror of "this apparently straightforward code is behaving in impossible ways" and "I might have inadvertently broken SerialMesh code" wasn't good. > You might want to know what I did, so I'll tell you: > > At some place in my code, I forgot to ALLREDUCE some important quantities. > This led to an inconsistent refinement flagging of the grid over the > different processors. Hence, after the refinement, the processors were > working on different grids. That would do it. I wonder if we should do something to help catch such problems even in opt mode. Testing a whole SerialMesh for consistency might be overkill for an "optimized" code, but we might take one message's latency hit to at least make sure that n_nodes() and n_elem() are the same after a completed refinement. Thoughts, anyone? > I wonder why my application code was running in earlier days. That was > indeed on a different cluster, but it was nevertheless a cluster. Parallelism bugs can depend on how many processors you're using or simply on how the partitioning unfolds. It's not fun to debug when you catch a failure on 32 cores on a fine mesh that doesn't immediately whittle down to 2 cores on a coarse mesh. --- Roy ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ Libmesh-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/libmesh-users
