Thanks for your response, George.

Just confirming that this should be okay to use iteratively is a huge help.

After further investigation, this only seems to occur on my test workstation 
with the following …

Open MPI repo revision: v4.0.2
Open MPI release date: Oct 07, 2019
                Open RTE: 4.0.2
Configured architecture: x86_64-apple-darwin19.2.0

g++ --version
Apple clang version 11.0.3 (clang-1103.0.32.29)
Target: x86_64-apple-darwin19.2.0
Thread model: posix


I am not currently able to duplicate the errors on an actual Linux cluster with 
OpenMPI 4.0.2.

So, this is probably insignificant for most production use--but in case you are 
interested, from what I can tell, this code block should reproduce the error 
for OpenMPI\Clang...


int main(int argc, const char * argv[]) {

    MPI_Init(NULL, NULL);

    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    for(int run=1; run<=30; run++) {

        MPI_Comm topology;

        const int send[1] = { world_rank == world_size-1 ? 0 : world_rank+1 };
        const int receive[1] = { world_rank > 0 ? world_rank-1 : world_size-1 };
        const int degrees[1] = { 1 };
        const int weights[1] = { 1 };

        printf("rank %d send -> %d\r\n", world_rank, send[0]);
        printf("rank %d receive -> %d\r\n", world_rank, receive[0]);

        MPI_Comm oldcomm = MPI_COMM_WORLD;

        MPI_Dist_graph_create(oldcomm, 1, send, degrees, receive, weights, 
MPI_INFO_NULL, 1, &topology);

    }

}


Thanks,

-Bradley



On Apr 6, 2020, at 10:36 AM, George Bosilca 
<bosi...@icl.utk.edu<mailto:bosi...@icl.utk.edu>> wrote:

Bradley,

You call then through a blocking MPI function, the operation is therefore 
completed by the time you return from the MPI call. So, short story you should 
be safe calling the dost_graph_create in a loop.

The segfault indicates a memory issue with some of the internals of the 
treematch. Do you have an example that reproduces this issue so that I can take 
a look and fix it ?

Thanks,
  George.


On Mon, Apr 6, 2020 at 11:31 AM Bradley Morgan via devel 
<devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>> wrote:
Hello OMPI Developers and Community,

I am interested in investigating dynamic runtime optimization of MPI topologies 
using an evolutionary approach.

My initial testing is resulting in segfaults\sigabrts when I attempt to 
iteratively create a new communicator with reordering enabled, e.g…

[88881] Signal: Segmentation fault: 11 (11)
[88881] Signal code: Address not mapped (1)
[88881] Failing at address: 0x0
[88881] [ 0] 0   libsystem_platform.dylib            0x00007fff69dff42d 
_sigtramp + 29
[88881] [ 1] 0   mpi_island_model_ea                 0x0000000100000032 
mpi_island_model_ea + 50
[88881] [ 2] 0   mca_topo_treematch.so               0x0000000105ddcbf9 
free_list_child + 41
[88881] [ 3] 0   mca_topo_treematch.so               0x0000000105ddcbf9 
free_list_child + 41
[88881] [ 4] 0   mca_topo_treematch.so               0x0000000105ddcd1f 
tm_free_tree + 47
[88881] [ 5] 0   mca_topo_treematch.so               0x0000000105dd6967 
mca_topo_treematch_dist_graph_create + 9479
[88881] [ 6] 0   libmpi.40.dylib                     0x00000001001992e0 
MPI_Dist_graph_create + 640
[88881] [ 7] 0   mpi_island_model_ea                 0x00000001000050c7 main + 
1831


I see in some documentation where MPI_Dist_graph_create is not interrupt safe, 
which I interpret to mean it is not really designed for iterative use without 
some sort of safeguard to keep it from overlapping.

I guess my question is, are the topology mapping functions really meant to be 
called in iteration, or are they meant for single use?

If you guys think this is something that might be possible, do you have any 
suggestions for calling the topology mapping iteratively or any hints, docs, 
etc. on what else might be going wrong here?


Thanks,

Bradley






Reply via email to