Another data point... job starts fine on half the procs....

Derek


On Tue, Apr 9, 2013 at 8:26 PM, Derek Gaston <fried...@gmail.com> wrote:

> Is there any way to disable the hilbert stuff for now?  With serial mesh
> can we just take the numbering from the node numbering?
>
>
> On Tue, Apr 9, 2013 at 8:21 PM, Derek Gaston <fried...@gmail.com> wrote:
>
>> serial
>>
>>
>> On Tue, Apr 9, 2013 at 8:21 PM, Kirk, Benjamin (JSC-EG311) <
>> benjamin.kir...@nasa.gov> wrote:
>>
>>> Serial or parallel mesh?
>>>
>>>
>>>
>>> On Apr 9, 2013, at 9:16 PM, "Kirk, Benjamin (JSC-EG311)" <
>>> benjamin.kir...@nasa.gov> wrote:
>>>
>>> > Hmm - I'll look through that section of code tomorrow morning and see
>>> if there could possibly be any mismatched send/receives or anything.
>>> >
>>> > -Ben
>>> >
>>> > On Apr 9, 2013, at 8:48 PM, "Derek Gaston" <fried...@gmail.com> wrote:
>>> >
>>> >> Hey guys,
>>> >>
>>> >> I've got a fairly large job (>3500 procs) that is hanging while
>>> trying to setup the mesh.  The procs are in 2 separate places.  ~Half of
>>> them are here:
>>> >>
>>> >> #35 0x00002b1746aea7f0 in
>>> libMesh::Parallel::Communicator::send_receive<Hilbert::HilbertIndices> ()
>>> from /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>> >> #36 0x00002b1746afc17b in void
>>> libMesh::MeshCommunication::find_global_indices<libMesh::MeshBase::element_iterator>(libMesh::MeshTools::BoundingBox
>>> const&, libMesh::MeshBase::element_iterator const&,
>>> libMesh::MeshBase::element_iterator const&, std::vector<unsigned int,
>>> std::allocator<unsigned int> >&) const () from
>>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>> >> #37 0x00002b1746c78528 in
>>> libMesh::Partitioner::partition_unpartitioned_elements(libMesh::MeshBase&,
>>> unsigned int) () from
>>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>> >> #38 0x00002b1746c7b3e5 in
>>> libMesh::Partitioner::partition(libMesh::MeshBase&, unsigned int) () from
>>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>> >> #39 0x00002b1746ad1d32 in libMesh::MeshBase::prepare_for_use(bool) ()
>>> from /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>> >>
>>> >>
>>> >> And the other ~half are here:
>>> >>
>>> >> #6  0x00002ba5c95c8b90 in
>>> libMesh::Parallel::Communicator::send_receive<unsigned int> () from
>>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>> >> #7  0x00002ba5c95da2e2 in void
>>> libMesh::MeshCommunication::find_global_indices<libMesh::MeshBase::element_iterator>(libMesh::MeshTools::BoundingBox
>>> const&, libMesh::MeshBase::element_iterator const&,
>>> libMesh::MeshBase::element_iterator const&, std::vector<unsigned int,
>>> std::allocator<unsigned int> >&) const () from
>>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>> >> #8  0x00002ba5c9756528 in
>>> libMesh::Partitioner::partition_unpartitioned_elements(libMesh::MeshBase&,
>>> unsigned int) () from
>>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>> >> #9  0x00002ba5c97593e5 in
>>> libMesh::Partitioner::partition(libMesh::MeshBase&, unsigned int) () from
>>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0
>>> >>
>>> >>
>>> >>
>>> >> Obviously they are in slightly different spots...
>>> >>
>>> >>
>>> >> Any ideas on what's going on here or where to start looking?
>>> >>
>>> >> I was intermittently getting weird errors around this point from
>>> mvapich so I've tried to switch to OpenMPI.... and it's hanging up here.
>>> >>
>>> >> The mesh itself isn't enormous... it's only about 1 million nodes or
>>> so.  We've definitely done more than this before.
>>> >>
>>> >> Thanks in advance for any advice!
>>> >>
>>> >> Derek
>>> >>
>>> ------------------------------------------------------------------------------
>>> >> Precog is a next-generation analytics platform capable of advanced
>>> >> analytics on semi-structured data. The platform includes APIs for
>>> building
>>> >> apps and a phenomenal toolset for data science. Developers can use
>>> >> our toolset for easy data analysis & visualization. Get a free
>>> account!
>>> >> http://www2.precog.com/precogplatform/slashdotnewsletter
>>> >> _______________________________________________
>>> >> Libmesh-devel mailing list
>>> >> Libmesh-devel@lists.sourceforge.net
>>> >> https://lists.sourceforge.net/lists/listinfo/libmesh-devel
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > Precog is a next-generation analytics platform capable of advanced
>>> > analytics on semi-structured data. The platform includes APIs for
>>> building
>>> > apps and a phenomenal toolset for data science. Developers can use
>>> > our toolset for easy data analysis & visualization. Get a free account!
>>> > http://www2.precog.com/precogplatform/slashdotnewsletter
>>> > _______________________________________________
>>> > Libmesh-devel mailing list
>>> > Libmesh-devel@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/libmesh-devel
>>>
>>
>>
>
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Libmesh-devel mailing list
Libmesh-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-devel

Reply via email to