Hi,

I am facing a problem with a test that runs fine on some nodes, and
fails on others.

I have a heterogenous cluster, with 3 types of nodes:
1) Single socket , 4 cores
2) 2 sockets, 4cores per socket
3) 2 sockets, 6 cores/socket

I am using:
 . salloc to allocate the nodes,
 . mpirun binding/mapping options "-bind-to-socket -bysocket"

# salloc -N 1 mpirun -n 4 -bind-to-socket -bysocket sleep 900

This command fails if the allocated node is of type #1 (single socket/4
cpus).
BTW, in that case orte_show_help is referencing a tag
("could-not-bind-to-socket") that does not exist in
help-odls-default.txt.

While it succeeds when run on nodes of type #2 or 3.
I think a "bind to socket" should not return an error on a single socket
machine, but rather be a noop.

The problem comes from the test
OPAL_PAFFINITY_PROCESS_IS_BOUND(mask, &bound);
called in odls_default_fork_local_proc() after the binding to the
processors socket has been done:
========
    <snip>
    OPAL_PAFFINITY_CPU_ZERO(mask);
    for (n=0; n < orte_default_num_cores_per_socket; n++) {
        <snip>
        OPAL_PAFFINITY_CPU_SET(phys_cpu, mask);
    }
    /* if we did not bind it anywhere, then that is an error */
    OPAL_PAFFINITY_PROCESS_IS_BOUND(mask, &bound);
    if (!bound) {
        orte_show_help("help-odls-default.txt",
                       "odls-default:could-not-bind-to-socket", true);
        ORTE_ODLS_ERROR_OUT(ORTE_ERR_FATAL);
    }
========
OPAL_PAFFINITY_PROCESS_IS_BOUND() will return true if there bits set in
the mask *AND* the number of bits set is lesser than the number of cpus
on the machine. Thus on a single socket, 4 cores machine the test will
fail. While on other the kinds of machines it will succeed.

Again, I think the problem could be solved by changing the alogrithm,
and assuming that ORTE_BIND_TO_SOCKET, on a single socket machine =
noop.

Another solution could be to call the test
OPAL_PAFFINITY_PROCESS_IS_BOUND() at the end of the loop only if we are
bound (orte_odls_globals.bound). Actually that is the only case where I
see a justification to this test (see attached patch).

And may be both solutions could be mixed.

Regards,
Nadia


-- 
Nadia Derbey <nadia.der...@bull.net>
Do not test actual process binding in obvious cases

diff -r 0b851b2e7934 orte/mca/odls/default/odls_default_module.c
--- a/orte/mca/odls/default/odls_default_module.c	Thu Mar 18 16:10:25 2010 +0100
+++ b/orte/mca/odls/default/odls_default_module.c	Fri Apr 09 11:38:28 2010 +0200
@@ -747,12 +747,16 @@ static int odls_default_fork_local_proc(
                                          target_socket, phys_core, phys_cpu));
                     OPAL_PAFFINITY_CPU_SET(phys_cpu, mask);
                 }
-                /* if we did not bind it anywhere, then that is an error */
-                OPAL_PAFFINITY_PROCESS_IS_BOUND(mask, &bound);
-                if (!bound) {
-                    orte_show_help("help-odls-default.txt",
-                                   "odls-default:could-not-bind-to-socket", true);
-                    ORTE_ODLS_ERROR_OUT(ORTE_ERR_FATAL);
+                /* if we actually did not bind it anywhere and it was
+                 * originally bound then that is an error
+                 */
+                if (orte_odls_globals.bound) {
+                    OPAL_PAFFINITY_PROCESS_IS_BOUND(mask, &bound);
+                    if (!bound) {
+                        orte_show_help("help-odls-default.txt",
+                                       "odls-default:could-not-bind-to-socket", true);
+                        ORTE_ODLS_ERROR_OUT(ORTE_ERR_FATAL);
+                    }
                 }
                 if (orte_report_bindings) {
                     opal_output(0, "%s odls:default:fork binding child %s to socket %d cpus %04lx",

Reply via email to