On Fri, 2010-04-09 at 14:23 -0400, Terry Dontje wrote:
> Ralph Castain wrote: 
> > Okay, just wanted to ensure everyone was working from the same base
> > code. 
> > 
> > 
> > Terry, Brad: you might want to look this proposed change over.
> > Something doesn't quite look right to me, but I haven't really
> > walked through the code to check it.
> > 
> > 
> At first blush I don't really get the usage of orte_odls_globals.bound
> in you patch.  It would seem to me that the insertion of that
> conditional would prevent the check it surrounds being done when the
> process has not been bounded prior to startup which is a common case.

Well, if you have a look at the algo in the ORTE_BIND_TO_SOCKET path
(odls_default_fork_local_proc() in odls_default_module.c):

<set target_socket depending on the desired mapping>
<set my paffinity mask to 0>       (line 715)
<for each core in the socket> {
    <get the associated phys_core>
    <get the associated phys_cpu>
    <if we are bound (orte_odls_globals.bound)> {
        <if phys_cpu does not belong to the cpus I'm bound to>
            continue
    }
    <set phys-cpu bit in my affinity mask>
}
<check if something is set in my affinity mask>
...


What I'm saying is that the only way to have nothing set in the affinity
mask (which would justify the last test) is to have never called the
<set phys_cpu in my affinity mask> instruction. This means:
  . the test on orte_odls_globals.bound is true
  . call <continue> for all the cores in the socket.

In the other path, what we are doing is checking if we have set one or
more bits in a mask after having actually set them: don't you think it's
useless?

That's why I'm suggesting to call the last check only if
orte_odls_globals.bound is true.

Regards,
Nadia
> 
> --td
> 
> 
> > 
> > On Apr 9, 2010, at 9:33 AM, Terry Dontje wrote:
> > 
> > > Nadia Derbey wrote: 
> > > > On Fri, 2010-04-09 at 08:41 -0600, Ralph Castain wrote:
> > > >   
> > > > > Just to check: is this with the latest trunk? Brad and Terry have 
> > > > > been making changes to this section of code, including modifying the 
> > > > > PROCESS_IS_BOUND test...
> > > > > 
> > > > > 
> > > > >     
> > > > 
> > > > Well, it was on the v1.5. But I just checked: looks like
> > > >   1. the call to OPAL_PAFFINITY_PROCESS_IS_BOUND is still there in
> > > >      odls_default_fork_local_proc()
> > > >   2. OPAL_PAFFINITY_PROCESS_IS_BOUND() is defined the same way
> > > > 
> > > > But, I'll give it a try with the latest trunk.
> > > > 
> > > > Regards,
> > > > Nadia
> > > > 
> > > >   
> > > The changes, I've done do not touch
> > > OPAL_PAFFINITY_PROCESS_IS_BOUND at all.  Also, I am only touching
> > > code related to the "bind-to-core" option so I really doubt if my
> > > changes are causing issues here.
> > > 
> > > --td
> > > > > On Apr 9, 2010, at 3:39 AM, Nadia Derbey wrote:
> > > > > 
> > > > >     
> > > > > > Hi,
> > > > > > 
> > > > > > I am facing a problem with a test that runs fine on some nodes, and
> > > > > > fails on others.
> > > > > > 
> > > > > > I have a heterogenous cluster, with 3 types of nodes:
> > > > > > 1) Single socket , 4 cores
> > > > > > 2) 2 sockets, 4cores per socket
> > > > > > 3) 2 sockets, 6 cores/socket
> > > > > > 
> > > > > > I am using:
> > > > > > . salloc to allocate the nodes,
> > > > > > . mpirun binding/mapping options "-bind-to-socket -bysocket"
> > > > > > 
> > > > > > # salloc -N 1 mpirun -n 4 -bind-to-socket -bysocket sleep 900
> > > > > > 
> > > > > > This command fails if the allocated node is of type #1 (single 
> > > > > > socket/4
> > > > > > cpus).
> > > > > > BTW, in that case orte_show_help is referencing a tag
> > > > > > ("could-not-bind-to-socket") that does not exist in
> > > > > > help-odls-default.txt.
> > > > > > 
> > > > > > While it succeeds when run on nodes of type #2 or 3.
> > > > > > I think a "bind to socket" should not return an error on a single 
> > > > > > socket
> > > > > > machine, but rather be a noop.
> > > > > > 
> > > > > > The problem comes from the test
> > > > > > OPAL_PAFFINITY_PROCESS_IS_BOUND(mask, &bound);
> > > > > > called in odls_default_fork_local_proc() after the binding to the
> > > > > > processors socket has been done:
> > > > > > ========
> > > > > >    <snip>
> > > > > >    OPAL_PAFFINITY_CPU_ZERO(mask);
> > > > > >    for (n=0; n < orte_default_num_cores_per_socket; n++) {
> > > > > >        <snip>
> > > > > >        OPAL_PAFFINITY_CPU_SET(phys_cpu, mask);
> > > > > >    }
> > > > > >    /* if we did not bind it anywhere, then that is an error */
> > > > > >    OPAL_PAFFINITY_PROCESS_IS_BOUND(mask, &bound);
> > > > > >    if (!bound) {
> > > > > >        orte_show_help("help-odls-default.txt",
> > > > > >                       "odls-default:could-not-bind-to-socket", 
> > > > > > true);
> > > > > >        ORTE_ODLS_ERROR_OUT(ORTE_ERR_FATAL);
> > > > > >    }
> > > > > > ========
> > > > > > OPAL_PAFFINITY_PROCESS_IS_BOUND() will return true if there bits 
> > > > > > set in
> > > > > > the mask *AND* the number of bits set is lesser than the number of 
> > > > > > cpus
> > > > > > on the machine. Thus on a single socket, 4 cores machine the test 
> > > > > > will
> > > > > > fail. While on other the kinds of machines it will succeed.
> > > > > > 
> > > > > > Again, I think the problem could be solved by changing the 
> > > > > > alogrithm,
> > > > > > and assuming that ORTE_BIND_TO_SOCKET, on a single socket machine =
> > > > > > noop.
> > > > > > 
> > > > > > Another solution could be to call the test
> > > > > > OPAL_PAFFINITY_PROCESS_IS_BOUND() at the end of the loop only if we 
> > > > > > are
> > > > > > bound (orte_odls_globals.bound). Actually that is the only case 
> > > > > > where I
> > > > > > see a justification to this test (see attached patch).
> > > > > > 
> > > > > > And may be both solutions could be mixed.
> > > > > > 
> > > > > > Regards,
> > > > > > Nadia
> > > > > > 
> > > > > > 
> > > > > > -- 
> > > > > > Nadia Derbey <nadia.der...@bull.net>
> > > > > > <001_fix_process_binding_test.patch>_______________________________________________
> > > > > > devel mailing list
> > > > > > de...@open-mpi.org
> > > > > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > > > >       
> > > > > _______________________________________________
> > > > > devel mailing list
> > > > > de...@open-mpi.org
> > > > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > > > 
> > > > >     
> > > 
> > > 
> > > -- 
> > > <Mail Attachment.gif>
> > > Terry D. Dontje | Principal Software Engineer
> > > Developer Tools Engineering | +1.650.633.7054
> > > Oracle - Performance Technologies
> > > 95 Network Drive, Burlington, MA 01803
> > > Email terry.don...@oracle.com
> > > 
> > > 
> > > _______________________________________________
> > > devel mailing list
> > > de...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > 
> > 
> > 
> > ____________________________________________________________________
> > 
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Oracle
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.650.633.7054
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
-- 
Nadia Derbey <nadia.der...@bull.net>

Reply via email to