On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote: > I finally managed to track down some issues in mpi4py's test suite > using Open MPI 1.8+. The code below should be enough to reproduce the > problem. Run it under valgrind to make sense of my following > diagnostics. > > In this code I'm creating a 2D, periodic Cartesian topology out of > COMM_SELF. In this case, the process in COMM_SELF has 4 logical in/out > links to itself. So we have size=1 but indegree=outdegree=4. However, > in ompi/mca/coll/basic/coll_basic_module.c, "size * 2" request are > being allocated to manage communication: > > if (OMPI_COMM_IS_INTER(comm)) { > size = ompi_comm_remote_size(comm); > } else { > size = ompi_comm_size(comm); > } > basic_module->mccb_num_reqs = size * 2; > basic_module->mccb_reqs = (ompi_request_t**) > malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs); > > I guess you have to also special-case for topologies and allocate > indegree+outdegree requests (not sure about this number, just > guessing). >
I wish this was possible but the topology information is not available at that point. We may be able to change that but I don't see the work completing anytime soon. I committed an alternative fix as r32796 and CMR'd it to 1.8.3. I can confirm that the attached reproducer no longer produces a SEGV. Let me know if you run into any more issues. -Nathan
pgpiboDbxhbSj.pgp
Description: PGP signature