On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote:
> I finally managed to track down some issues in mpi4py's test suite
> using Open MPI 1.8+. The code below should be enough to reproduce the
> problem. Run it under valgrind to make sense of my following
> diagnostics.
> 
> In this code I'm creating a 2D, periodic Cartesian topology out of
> COMM_SELF. In this case, the process in COMM_SELF has 4 logical in/out
> links to itself. So we have size=1 but indegree=outdegree=4. However,
> in ompi/mca/coll/basic/coll_basic_module.c, "size * 2" request are
> being allocated to manage communication:
> 
>     if (OMPI_COMM_IS_INTER(comm)) {
>         size = ompi_comm_remote_size(comm);
>     } else {
>         size = ompi_comm_size(comm);
>     }
>     basic_module->mccb_num_reqs = size * 2;
>     basic_module->mccb_reqs = (ompi_request_t**)
>         malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs);
> 
> I guess you have to also special-case for topologies and allocate
> indegree+outdegree requests (not sure about this number, just
> guessing).
>

I wish this was possible but the topology information is not available
at that point. We may be able to change that but I don't see the work
completing anytime soon. I committed an alternative fix as r32796 and
CMR'd it to 1.8.3. I can confirm that the attached reproducer no longer
produces a SEGV. Let me know if you run into any more issues.


-Nathan

Attachment: pgpiboDbxhbSj.pgp
Description: PGP signature

Reply via email to