Anyone sleep on this and come up with any ideas to try?

One thing to note is that we are actually reading the mesh on every
processor still (because of the block / sideset naming stuff that Cody only
recently fixed).  Do you believe that could be part of the problem?

Currently I can't run over about 1,700 procs without hitting this hang.
 I'm recompiling with a new version of mvapich... and I'm hoping that that
fixes it... but I'd like to know if there is anything else I can try...

Derek



On Tue, Apr 9, 2013 at 8:27 PM, Derek Gaston <fried...@gmail.com> wrote:

> Another data point... job starts fine on half the procs....
>
> Derek
>
>
> On Tue, Apr 9, 2013 at 8:26 PM, Derek Gaston <fried...@gmail.com> wrote:
>
>> Is there any way to disable the hilbert stuff for now?  With serial mesh
>> can we just take the numbering from the node numbering?
>>
>>
>> On Tue, Apr 9, 2013 at 8:21 PM, Derek Gaston <fried...@gmail.com> wrote:
>>
>>> serial
>>>
>>>
>>> On Tue, Apr 9, 2013 at 8:21 PM, Kirk, Benjamin (JSC-EG311) <
>>> benjamin.kir...@nasa.gov> wrote:
>>>
>>>> Serial or parallel mesh?
>>>>
>>>>
>>>>
>>>> On Apr 9, 2013, at 9:16 PM, "Kirk, Benjamin (JSC-EG311)" <
>>>> benjamin.kir...@nasa.gov> wrote:
>>>>
>>>> > Hmm - I'll look through that section of code tomorrow morning and see
>>>> if there could possibly be any mismatched send/receives or anything.
>>>> >
>>>> > -Ben
>>>> >
>>>> > On Apr 9, 2013, at 8:48 PM, "Derek Gaston" <fried...@gmail.com>
>>>> wrote:
>>>> >
>>>> >> Hey guys,
>>>> >>
>>>> >> I've got a fairly large job (>3500 procs) that is hanging while
>>>> trying to setup the mesh.  The procs are in 2 separate places.  ~Half of
>>>> them are here:
>>>> >>
>>>> >> #35 0x00002b1746aea7f0 in
>>>> libMesh::Parallel::Communicator::send_receive<Hilbert::HilbertIndices> ()
>>>> from /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>>> >> #36 0x00002b1746afc17b in void
>>>> libMesh::MeshCommunication::find_global_indices<libMesh::MeshBase::element_iterator>(libMesh::MeshTools::BoundingBox
>>>> const&, libMesh::MeshBase::element_iterator const&,
>>>> libMesh::MeshBase::element_iterator const&, std::vector<unsigned int,
>>>> std::allocator<unsigned int> >&) const () from
>>>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>>> >> #37 0x00002b1746c78528 in
>>>> libMesh::Partitioner::partition_unpartitioned_elements(libMesh::MeshBase&,
>>>> unsigned int) () from
>>>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>>> >> #38 0x00002b1746c7b3e5 in
>>>> libMesh::Partitioner::partition(libMesh::MeshBase&, unsigned int) () from
>>>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>>> >> #39 0x00002b1746ad1d32 in libMesh::MeshBase::prepare_for_use(bool)
>>>> () from /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>>> >>
>>>> >>
>>>> >> And the other ~half are here:
>>>> >>
>>>> >> #6  0x00002ba5c95c8b90 in
>>>> libMesh::Parallel::Communicator::send_receive<unsigned int> () from
>>>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>>> >> #7  0x00002ba5c95da2e2 in void
>>>> libMesh::MeshCommunication::find_global_indices<libMesh::MeshBase::element_iterator>(libMesh::MeshTools::BoundingBox
>>>> const&, libMesh::MeshBase::element_iterator const&,
>>>> libMesh::MeshBase::element_iterator const&, std::vector<unsigned int,
>>>> std::allocator<unsigned int> >&) const () from
>>>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>>> >> #8  0x00002ba5c9756528 in
>>>> libMesh::Partitioner::partition_unpartitioned_elements(libMesh::MeshBase&,
>>>> unsigned int) () from
>>>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>>> >> #9  0x00002ba5c97593e5 in
>>>> libMesh::Partitioner::partition(libMesh::MeshBase&, unsigned int) () from
>>>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>>> >>
>>>> >>
>>>> >>
>>>> >> Obviously they are in slightly different spots...
>>>> >>
>>>> >>
>>>> >> Any ideas on what's going on here or where to start looking?
>>>> >>
>>>> >> I was intermittently getting weird errors around this point from
>>>> mvapich so I've tried to switch to OpenMPI.... and it's hanging up here.
>>>> >>
>>>> >> The mesh itself isn't enormous... it's only about 1 million nodes or
>>>> so.  We've definitely done more than this before.
>>>> >>
>>>> >> Thanks in advance for any advice!
>>>> >>
>>>> >> Derek
>>>> >>
>>>> ------------------------------------------------------------------------------
>>>> >> Precog is a next-generation analytics platform capable of advanced
>>>> >> analytics on semi-structured data. The platform includes APIs for
>>>> building
>>>> >> apps and a phenomenal toolset for data science. Developers can use
>>>> >> our toolset for easy data analysis & visualization. Get a free
>>>> account!
>>>> >> http://www2.precog.com/precogplatform/slashdotnewsletter
>>>> >> _______________________________________________
>>>> >> Libmesh-devel mailing list
>>>> >> Libmesh-devel@lists.sourceforge.net
>>>> >> https://lists.sourceforge.net/lists/listinfo/libmesh-devel
>>>> >
>>>> >
>>>> ------------------------------------------------------------------------------
>>>> > Precog is a next-generation analytics platform capable of advanced
>>>> > analytics on semi-structured data. The platform includes APIs for
>>>> building
>>>> > apps and a phenomenal toolset for data science. Developers can use
>>>> > our toolset for easy data analysis & visualization. Get a free
>>>> account!
>>>> > http://www2.precog.com/precogplatform/slashdotnewsletter
>>>> > _______________________________________________
>>>> > Libmesh-devel mailing list
>>>> > Libmesh-devel@lists.sourceforge.net
>>>> > https://lists.sourceforge.net/lists/listinfo/libmesh-devel
>>>>
>>>
>>>
>>
>
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Libmesh-devel mailing list
Libmesh-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-devel

Reply via email to