Could you describe the configure options that gave rise to this? I'm not seeing it with my default stuff. Parallel mesh or what?
Let me know how you configured and I'll see if I can replicate. -Ben On Nov 9, 2012, at 1:35 AM, "Roy Stogner" <royst...@ices.utexas.edu> wrote: > > On Thu, 8 Nov 2012, Roy Stogner wrote: > >> I'm seeing hangs on our BuildBot servers when running the >> "-online_mode 1" step of reduced_basis_ex6 with --n_threads=2 in the >> LIBMESH_OPTIONS. That may just be because our BuildBot server is >> ridiculously overloaded, and I'll see if I can verify it manually when >> I get time, but before I dig into things it occurs to me that if you >> run multithreaded yourself then I can be more confident that I'm >> seeing a false positive. > > It's definitely *not* a false positive - I managed to hang > reduced_basis_ex6 today manually while testing out a new laptop. It's > also not a threading issue; this was a 4-MPI-tasks, 1-thread-each run. > > I'm not sure what the issue is, maybe something with recent I/O > changes? Stack traces (from processes interrupted in some kind of busy-loop) > include three processes caught here: > > (gdb) where > #0 0x00002b7b3612e034 in ?? () from > /usr/lib/openmpi/lib/openmpi/mca_btl_sm.so > #1 0x00002b7b2b40d46a in opal_progress () from /usr/lib/libopen-pal.so.0 > #2 0x00002b7b1d953595 in ?? () from /usr/lib/libmpi.so.0 > #3 0x00002b7b371df33f in ?? () from > /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so > #4 0x00002b7b1d968260 in PMPI_Allreduce () from /usr/lib/libmpi.so.0 > #5 0x00002b7b2469066b in VecAssemblyBegin_MPI(_p_Vec*) () from > /usr/lib/libpetsc.so.3.2 > #6 0x00002b7b247baa53 in VecAssemblyBegin () from /usr/lib/libpetsc.so.3.2 > #7 0x00002b7b1a2604df in libMesh::PetscVector<double>::close (this=0x2a8c7b0) > at /home/roystgnr/libmesh/svn/include/libmesh/petsc_vector.h:910 > #8 0x00002b7b1a407dd3 in libMesh::System::read_serialized_vector > (this=this@entry=0x2a82970, > io=..., vec=...) at src/systems/system_io.C:1333 > #9 0x00002b7b1a408af8 in libMesh::System::read_serialized_data > (this=0x2a82970, io=..., > read_additional_data=<optimized out>) at src/systems/system_io.C:719 > #10 0x00002b7b1a31d63b in libMesh::RBEvaluation::read_in_basis_functions > (this=0x7fff9b94c490, > sys=..., directory_name=..., read_binary_basis_functions=true) > at src/reduced_basis/rb_evaluation.C:960 > #11 0x000000000042c736 in main (argc=13, argv=0x7fff9b94c978) at > reduced_basis_ex6.C:227 > > and one caught here: > > #0 0x00002ba8cf0ebfb9 in ?? () from > /usr/lib/openmpi/lib/openmpi/mca_btl_sm.so > #1 0x00002ba8c43cb46a in opal_progress () from /usr/lib/libopen-pal.so.0 > #2 0x00002ba8b6911595 in ?? () from /usr/lib/libmpi.so.0 > #3 0x00002ba8d019d33f in ?? () from > /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so > #4 0x00002ba8b6926260 in PMPI_Allreduce () from /usr/lib/libmpi.so.0 > #5 0x00002ba8b2b9088a in libMesh::Parallel::Communicator::min<unsigned long> > ( > this=this@entry=0x2ba8b383c520 <libMesh::CommWorld>, r=@0x7fffc214a840: 1) > at > /home/roystgnr/libmesh/svn/include/libmesh/parallel_implementation.h:1148 > #6 0x00002ba8b32571e3 in libMesh::Parallel::Communicator::verify<unsigned > long> ( > this=this@entry=0x2ba8b383c520 <libMesh::CommWorld>, r=@0x7fffc214a898: 1) > at > /home/roystgnr/libmesh/svn/include/libmesh/parallel_implementation.h:1114 > #7 0x00002ba8b32581bb in verify<unsigned long> (r=@0x7fffc214a898: 1, > this=0x2ba8b383c520 <libMesh::CommWorld>) > at > /home/roystgnr/libmesh/svn/include/libmesh/parallel_implementation.h:1111 > #8 libMesh::Parallel::Communicator::sum<unsigned int> (this=0x2ba8b383c520 > <libMesh::CommWorld>, > r=...) at > /home/roystgnr/libmesh/svn/include/libmesh/parallel_implementation.h:1606 > #9 0x00002ba8b33cdfc9 in sum<std::vector<unsigned int> > (comm=..., r=...) > at > /home/roystgnr/libmesh/svn/include/libmesh/parallel_implementation.h:775 > #10 > libMesh::System::read_serialized_blocked_dof_objects<libMesh::MeshBase::node_iterator> > ( > this=this@entry=0x1c62d00, n_objects=n_objects@entry=6171, begin=..., > end=..., io=..., > vec=..., var_to_read=var_to_read@entry=1) at src/systems/system_io.C:969 > #11 0x00002ba8b33c5bd5 in libMesh::System::read_serialized_vector > (this=this@entry=0x1c62d00, > io=..., vec=...) at src/systems/system_io.C:1305 > #12 0x00002ba8b33c6af8 in libMesh::System::read_serialized_data > (this=0x1c62d00, io=..., > read_additional_data=<optimized out>) at src/systems/system_io.C:719 > #13 0x00002ba8b32db63b in libMesh::RBEvaluation::read_in_basis_functions > (this=0x7fffc214bf10, > sys=..., directory_name=..., read_binary_basis_functions=true) > at src/reduced_basis/rb_evaluation.C:960 > #14 0x000000000042c736 in main (argc=13, argv=0x7fffc214c3f8) at > reduced_basis_ex6.C:227 > > > I can tell that I need to make Parallel::verify() more robust if possible, but > I haven't had time yet to figure out the underlying problem that verify() > (presumably a parallel_only() call) should have caught. > --- > Roy > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_nov > _______________________________________________ > Libmesh-devel mailing list > Libmesh-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/libmesh-devel ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_nov _______________________________________________ Libmesh-devel mailing list Libmesh-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmesh-devel