Well, I eliminated PETSc and have been linking to MPI using
--with-mpi=$MPI_DIR and playing with the refinement example I had mentioned
earlier to try and eliminate ParMETIS due the the hang/crash issue. In
these configs I attach either the LinearPartitioner or an SFC in
prepare_for_use right before calling partition(). This causes assert trips
in MeshComm::Redistribute where elem.proc_id != proc_id while unpacking
elems (stack below). These asserts trigger at slightly smaller square
meshes than in the original issue; SFC with 3^2 initial elems, while Linear
with 4^2.
At this point I wasnt sure about the MPI-partitioner support; Is attaching
a partitioner ok in prepare_for_use or is there some setup stage im
missing? If so, it seems theres very little that touches the mesh before
this point, it seems that pretty much somethings already off at
_refine_elements() since this all seems seperated from the partitioner. I
tried investigating some of the make_elems_parallel_consistent calls and
the libmesh_assert_valid_parallel_ids() call right after but so far no luck.
One nagging/lingering issue I have is with the us using PETSc flags for
MPI. In the PETSc build scripts that I originally was using we had to pass
in an extra -lpmi to the PETSc LDFLAGS on the local cluster. The recent
gcc7.2/Mvapich2 upgrade came with pmi2 that im to also pass to slurm and so
in the PETSc builds I supply -lpmi2 now. On the standalone MPI builds I
tried exporting libmesh_LDFLAGS and libmesh_LIBS to link against this
library, but was not sure if it was picked up as -lpmi2 didnt show in the
libmesh_optional_LIBS in the configure summaries like it does when linking
though PETSc. Im quite unfamiliar with thie pmi library in general but I
still have lingering fears this all could somehow stem from this.
Thanks for any info you can provide,
Boris
Stack Trace
=======
#0 __cxxabiv1::__cxa_throw (obj=obj@entry=0x9040e0,
tinfo=0x407e68 <typeinfo for libMesh::LogicError>,
tinfo@entry=0x7ffff76337b0 <typeinfo for libMesh::LogicError>,
dest=0x403250 <libMesh::LogicError::~LogicError()>,
dest@entry=0x7ffff6259370 <libMesh::LogicError::~LogicError()>)
at ../../../../gcc/libstdc++-v3/libsupc++/eh_throw.cc:75
#1 0x00007ffff6a98f8a in
libMesh::Parallel::Packing<libMesh::Elem*>::unpack<__gnu_cxx::__normal_iterator<unsigned
long const*, std::vector<unsigned long, std::allocator<unsigned long> > >,
libMesh::MeshBase> (in=...,
mesh=mesh@entry=0x64a0e0) at ../source/src/parallel/parallel_elem.C:474
#2 0x00007ffff6a995c5 in
libMesh::Parallel::Packing<libMesh::Elem*>::unpack<__gnu_cxx::__normal_iterator<unsigned
long const*, std::vector<unsigned long, std::allocator<unsigned long> > >,
libMesh::DistributedMesh> (in=...,
in@entry=987654321, mesh=mesh@entry=0x64a0e0)
at ../source/src/parallel/parallel_elem.C:814
#3 0x00007ffff688f8ab in unpack_range<libMesh::DistributedMesh, unsigned
long, libMesh::mesh_inserter_iterator<libMesh::Elem>, libMesh::Elem*>
(out_iter=...,
context=<optimized out>,
buffer=std::vector of length 195733, capacity 195733 = {...})
at ./include/libmesh/parallel_implementation.h:607
#4
libMesh::Parallel::Communicator::receive_packed_range<libMesh::DistributedMesh,
libMesh::mesh_inserter_iterator<libMesh::Elem>, libMesh::Elem*> (
this=0x649188, src_processor_id=src_processor_id@entry=4294967294,
context=context@entry=0x64a0e0, out_iter=out_iter@entry=...,
output_type=output_type@entry=0x0, tag=...)
at ./include/libmesh/parallel_implementation.h:2761
#5 0x00007ffff687d70e in libMesh::MeshCommunication::redistribute (
this=this@entry=0x7fffffff9cdf, mesh=...,
newly_coarsened_only=newly_coarsened_only@entry=false)
at ../source/src/mesh/mesh_communication.C:500
#6 0x00007ffff67ef2f2 in libMesh::DistributedMesh::redistribute (
this=0x64a0e0) at ../source/src/mesh/distributed_mesh.C:835
#7 0x00007ffff6acb1e6 in libMesh::Partitioner::partition (
this=<optimized out>, mesh=..., n=<optimized out>)
at ../source/src/partitioning/partitioner.C:85
#8 0x00007ffff685aa6e in libMesh::MeshBase::partition (
this=this@entry=0x64a0e0, n_parts=2) at
../source/src/mesh/mesh_base.C:485
#9 0x00007ffff685f8fb in partition (this=0x64a0e0)
at ./include/libmesh/mesh_base.h:728
#10 libMesh::MeshBase::prepare_for_use (this=0x64a0e0,
skip_renumber_nodes_and_elements=skip_renumber_nodes_and_elements@entry=false,
skip_find_neighbors=skip_find_neighbors@entry=false)
at ../source/src/mesh/mesh_base.C:273
#11 0x00007ffff6938b02 in libMesh::MeshRefinement::uniformly_refine (
this=this@entry=0x7fffffffa9e0, n=5)
at ../source/src/mesh/mesh_refinement.C:1723
#12 0x00007ffff7a8abc5 in GRINS::MeshBuilder::do_mesh_refinement_from_input
(
this=this@entry=0x646820, input=..., comm=..., mesh=...)
at ../../source/src/solver/src/mesh_builder.C:393
#13 0x00007ffff7a8bc6d in GRINS::MeshBuilder::build (this=0x646820,
input=...,
comm=...) at ../../source/src/solver/src/mesh_builder.C:167
#14 0x00007ffff7aa20ed in GRINS::SimulationBuilder::build_mesh (
this=this@entry=0x7fffffffb108, input=..., comm=...)
at ../../source/src/solver/src/simulation_builder.C:68
#15 0x00007ffff7a947d2 in GRINS::Simulation::Simulation (this=0x89b910,
input=..., sim_builder=..., comm=...)
at ../../source/src/solver/src/simulation.C:123
#16 0x00007ffff7acdae2 in GRINS::Runner::init (this=this@entry
=0x7fffffffb100)
at ../../source/src/solver/src/runner.C:59
---Type <return> to continue, or q <return> to quit---
#17 0x0000000000402c9d in main (argc=<optimized out>, argv=<optimized out>)
at ../../source/src/apps/grins.C:31
On Thu, Nov 9, 2017 at 9:07 AM, Roy Stogner <royst...@ices.utexas.edu>
wrote:
>
> On Mon, 6 Nov 2017, Boris Boutkov wrote:
>
> In some preliminary testing I encountered issues with the
>> LinearPartitioner,
>>
>
> Could you be more specific? That partitioner is dead simple, so I
> wouldn't have expected to see many bugs, but it's also awful, so if
> there were many bugs there's probably been nobody to use it and
> encounter them for a decade.
>
> and the SFCPartitioner complained it wasnt enabled despite me
>> configuring using --enable-everything. Any ideas if theres anything
>> simple I could have forgotten?
>>
>
> Yeah: the SFC partitioner isn't under an LGPL-friendly license, so
> unless you add --disable-strict-lgpl to your configure line, it still
> gets dropped for that reason.
> ---
> Roy
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Libmesh-devel mailing list
Libmesh-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-devel