The progress functions do not need to return an error code. If there is an error they should propagate it back through the descriptors. The only meaning of the return code of the progress functions is to know if any event had happened during this round of progress. The opal_output use it to trigger the yield (if one if necessary).

Anyway, this is a good catch. We're doing a big mixup there. Attached you will find a patch that clean this problem. As expected, there is no performance impact ...

  george.

Here is the patch:

Index: btl_sm_component.c
===================================================================
--- btl_sm_component.c  (revision 22176)
+++ btl_sm_component.c  (working copy)
@@ -361,7 +361,7 @@
     sm_fifo_t *fifo = NULL;
     mca_btl_sm_hdr_t *hdr;
     int my_smp_rank = mca_btl_sm_component.my_smp_rank;
-    int peer_smp_rank, j, rc = 0;
+    int peer_smp_rank, j, rc = 0, events = 0;

     /* first, deal with any pending sends */
/* This check should be fast since we only need to check one variable. */
@@ -399,7 +399,7 @@
             continue;
         }

-        rc++;
+        events++;
         /* dispatch fragment by type */
         switch(((uintptr_t)hdr) & MCA_BTL_SM_FRAG_TYPE_MASK) {
             case MCA_BTL_SM_FRAG_SEND:
@@ -480,5 +480,5 @@
                 break;
         }
     }
-    return rc;
+    return events;
 }


On Oct 30, 2009, at 15:22 , Eugene Loh wrote:

What is the significance of the btl_sm_component_progress() return code rc? It appears to be incremented each time something is read off the FIFO, but also it's the return code from writing to a FIFO. This seems kind of dual purpose.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to