What: We never call del_procs in the procs in comm world. This leads us
to leak the bml endpoints created by r2.

The proposed solution is not idea but it avoids adding a call to del
procs for comm world. Something I know would require more discussion
since there is likely a reason for that. I propose we delete any
remaining bml endpoints when we tear down the ompi_proc_t:

diff --git a/ompi/proc/proc.c b/ompi/proc/proc.c
index f549335..9ea0311 100644
--- a/ompi/proc/proc.c
+++ b/ompi/proc/proc.c
@@ -89,6 +89,13 @@ void ompi_proc_destruct(ompi_proc_t* proc)
     OPAL_THREAD_LOCK(&ompi_proc_lock);
     opal_list_remove_item(&ompi_proc_list, (opal_list_item_t*)proc);
     OPAL_THREAD_UNLOCK(&ompi_proc_lock);
+
+#if defined(OMPI_PROC_ENDPOINT_TAG_BML)
+    /* release the bml endpoint if it still exists */
+    if (proc->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_BML]) {
+        OBJ_RELEASE(proc->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_BML]);
+    }
+#endif
 }
 
This fixes the leak and appears to have no negative side effects for
r2.

Why: Trying to clean up the last remaining leaks in the Open MPI code
base. This is one of the larger ones as it grows with comm world.

When: I want this to go into 1.8.2 if possible. Setting a short timeout
of 1 week.

Keep in mind I do not know the full history of add_procs/del_procs so
there may be a better way to fix this. This RFC is meant to open the
discussion about how to address this leak. If the above fix looks ok I
will commit it.

-Nathan

Attachment: pgpxSOjRQoGpG.pgp
Description: PGP signature

Reply via email to