On Mon, 27 Feb 2017, Barna Becsek wrote: > I hope this does not come too late.
That depends entirely on your deadlines, I fear. ;-) > But I finally got the debugger to work with some help of support on > the Supercomputer that we are using. This is the complete stack: > > To recap the problem: > We were reading in a mesh that was created outside of libmesh > (rectilinear) and wanted to use that in the context of the MOOSE > framework. Before calling the function prepare_for_use we wanted to > make sure that the ghost elements for the neighbouring processes > were found. This seemed to work fine when compiled and run in opt > mode but caused the code to get stuck when done so in dbg mode (no > crash, just a running but hung-up executable). So this is the > complete stack of what is happening. It seems like MPI_Probe is > blocking the code for some reason that it is not getting its message > on one of the processes. > Have you ever come across something like that? I can't perfectly parse that stack (it appears to have been generated using an older version of mesh_tools.C), but it appears that the max() is what's blocking, and the probe() is further ahead in the code. I've never seen this happen, but I can guess one way it might: if the processors don't agree on mesh.max_elem_id(), then whichever processor(s) don't see the largest ID would exit that loop too early and leave the others hanging. Your mesh having an incorrect max_elem_id() right before prepare_for_use() isn't a bug and won't affect a real run, since we update_parallel_id_counts() from within prepare_for_use(), after which point every processor should know the correct max_elem_id(). It is theoretically possible that this failure is *masking* a real bug, though, so it's definitely worth fixing. My suggestion would be to replace for (dof_id_type i=0; i != mesh.max_elem_id(); ++i) with const dof_id_type max_id = mesh.parallel_max_elem_id(); for (dof_id_type i=0; i != max_id; ++i) in libmesh_assert_valid_neighbors in mesh_tools.C and if that works for you then you can put in a PR, or just let me know and I'll do so. Sorry about the hassle! This wasn't an intended use case of libmesh_assert_valid_neighbors(), but it *should* have been a supported use case. I hate finding bugs in our debugging code. I refuse to write debugging-code-debugging code. --- Roy ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Libmesh-users mailing list Libmesh-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmesh-users