Hello OMPI Developers and Community, I am interested in investigating dynamic runtime optimization of MPI topologies using an evolutionary approach.
My initial testing is resulting in segfaults\sigabrts when I attempt to iteratively create a new communicator with reordering enabled, e.g… [88881] Signal: Segmentation fault: 11 (11) [88881] Signal code: Address not mapped (1) [88881] Failing at address: 0x0 [88881] [ 0] 0 libsystem_platform.dylib 0x00007fff69dff42d _sigtramp + 29 [88881] [ 1] 0 mpi_island_model_ea 0x0000000100000032 mpi_island_model_ea + 50 [88881] [ 2] 0 mca_topo_treematch.so 0x0000000105ddcbf9 free_list_child + 41 [88881] [ 3] 0 mca_topo_treematch.so 0x0000000105ddcbf9 free_list_child + 41 [88881] [ 4] 0 mca_topo_treematch.so 0x0000000105ddcd1f tm_free_tree + 47 [88881] [ 5] 0 mca_topo_treematch.so 0x0000000105dd6967 mca_topo_treematch_dist_graph_create + 9479 [88881] [ 6] 0 libmpi.40.dylib 0x00000001001992e0 MPI_Dist_graph_create + 640 [88881] [ 7] 0 mpi_island_model_ea 0x00000001000050c7 main + 1831 I see in some documentation where MPI_Dist_graph_create is not interrupt safe, which I interpret to mean it is not really designed for iterative use without some sort of safeguard to keep it from overlapping. I guess my question is, are the topology mapping functions really meant to be called in iteration, or are they meant for single use? If you guys think this is something that might be possible, do you have any suggestions for calling the topology mapping iteratively or any hints, docs, etc. on what else might be going wrong here? Thanks, Bradley