On Mon, 2010-04-12 at 07:50 -0600, Ralph Castain wrote:
> Guess I'll jump in here as I finally had a few minutes to look at the code 
> and think about your original note. In fact, I believe your original 
> statement is the source of contention.
> 
> If someone tells us -bind-to-socket, but there is only one socket, then we 
> really cannot bind them to anything. Any check by their code would reveal 
> that they had not, in fact, been bound - raising questions as to whether or 
> not OMPI is performing the request. Our operating standard has been to error 
> out if the user specifies something we cannot do to avoid that kind of 
> confusion. This is what generated the code in the system today.
> 
> Now I can see an argument that -bind-to-socket with one socket maybe 
> shouldn't generate an error, but that decision then has to get reflected in 
> other code areas as well.

Actually, that was my original point: -bind-to-socket on a single socket
should IMHO lead to a noop, but not an error.

> 
> As for the test you cite -  it actually performs a valuable function and was 
> added to catch specific scenarios. In particular, if you follow the code flow 
> up just a little, you will see that it is possible to complete the loop 
> without ever actually setting a bit in the mask. This happens when none of 
> the cpus in that socket have been assigned to us via an external bind.

This is exactly what I said, but may be I didn't express right:
  . the test on orte_odls_globals.bound is true   
            ---> means we have an external bind
  . call <continue> for all the cores in the socket.
            ---> none of the cpus on the socket belongs to our binding

Regards,

>  People actually use that as a means of suballocating nodes, so the test 
> needs to be there. Again, if the user said "bind to socket", but none of that 
> socket's cores are assigned for our use, that is an error.
> 
> I haven't looked at your specific fix, but I agree with Terry's question. It 
> seems to me that whether or not we were externally bound is irrelevant. Even 
> if the overall result is what you want, I think a more logically 
> understandable test would help others reading the code.
> 
> But first we need to resolve the question: should this scenario return an 
> error or not?
> 
> 
> On Apr 12, 2010, at 1:43 AM, Nadia Derbey wrote:
> 
> > On Fri, 2010-04-09 at 14:23 -0400, Terry Dontje wrote:
> >> Ralph Castain wrote: 
> >>> Okay, just wanted to ensure everyone was working from the same base
> >>> code. 
> >>> 
> >>> 
> >>> Terry, Brad: you might want to look this proposed change over.
> >>> Something doesn't quite look right to me, but I haven't really
> >>> walked through the code to check it.
> >>> 
> >>> 
> >> At first blush I don't really get the usage of orte_odls_globals.bound
> >> in you patch.  It would seem to me that the insertion of that
> >> conditional would prevent the check it surrounds being done when the
> >> process has not been bounded prior to startup which is a common case.
> > 
> > Well, if you have a look at the algo in the ORTE_BIND_TO_SOCKET path
> > (odls_default_fork_local_proc() in odls_default_module.c):
> > 
> > <set target_socket depending on the desired mapping>
> > <set my paffinity mask to 0>       (line 715)
> > <for each core in the socket> {
> >    <get the associated phys_core>
> >    <get the associated phys_cpu>
> >    <if we are bound (orte_odls_globals.bound)> {
> >        <if phys_cpu does not belong to the cpus I'm bound to>
> >            continue
> >    }
> >    <set phys-cpu bit in my affinity mask>
> > }
> > <check if something is set in my affinity mask>
> > ...
> > 
> > 
> > What I'm saying is that the only way to have nothing set in the affinity
> > mask (which would justify the last test) is to have never called the
> > <set phys_cpu in my affinity mask> instruction. This means:
> >  . the test on orte_odls_globals.bound is true
> >  . call <continue> for all the cores in the socket.
> > 
> > In the other path, what we are doing is checking if we have set one or
> > more bits in a mask after having actually set them: don't you think it's
> > useless?
> > 
> > That's why I'm suggesting to call the last check only if
> > orte_odls_globals.bound is true.
> > 
> > Regards,
> > Nadia
> >> 
> >> --td
> >> 
> >> 
> >>> 
> >>> On Apr 9, 2010, at 9:33 AM, Terry Dontje wrote:
> >>> 
> >>>> Nadia Derbey wrote: 
> >>>>> On Fri, 2010-04-09 at 08:41 -0600, Ralph Castain wrote:
> >>>>> 
> >>>>>> Just to check: is this with the latest trunk? Brad and Terry have been 
> >>>>>> making changes to this section of code, including modifying the 
> >>>>>> PROCESS_IS_BOUND test...
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>> 
> >>>>> Well, it was on the v1.5. But I just checked: looks like
> >>>>>  1. the call to OPAL_PAFFINITY_PROCESS_IS_BOUND is still there in
> >>>>>     odls_default_fork_local_proc()
> >>>>>  2. OPAL_PAFFINITY_PROCESS_IS_BOUND() is defined the same way
> >>>>> 
> >>>>> But, I'll give it a try with the latest trunk.
> >>>>> 
> >>>>> Regards,
> >>>>> Nadia
> >>>>> 
> >>>>> 
> >>>> The changes, I've done do not touch
> >>>> OPAL_PAFFINITY_PROCESS_IS_BOUND at all.  Also, I am only touching
> >>>> code related to the "bind-to-core" option so I really doubt if my
> >>>> changes are causing issues here.
> >>>> 
> >>>> --td
> >>>>>> On Apr 9, 2010, at 3:39 AM, Nadia Derbey wrote:
> >>>>>> 
> >>>>>> 
> >>>>>>> Hi,
> >>>>>>> 
> >>>>>>> I am facing a problem with a test that runs fine on some nodes, and
> >>>>>>> fails on others.
> >>>>>>> 
> >>>>>>> I have a heterogenous cluster, with 3 types of nodes:
> >>>>>>> 1) Single socket , 4 cores
> >>>>>>> 2) 2 sockets, 4cores per socket
> >>>>>>> 3) 2 sockets, 6 cores/socket
> >>>>>>> 
> >>>>>>> I am using:
> >>>>>>> . salloc to allocate the nodes,
> >>>>>>> . mpirun binding/mapping options "-bind-to-socket -bysocket"
> >>>>>>> 
> >>>>>>> # salloc -N 1 mpirun -n 4 -bind-to-socket -bysocket sleep 900
> >>>>>>> 
> >>>>>>> This command fails if the allocated node is of type #1 (single 
> >>>>>>> socket/4
> >>>>>>> cpus).
> >>>>>>> BTW, in that case orte_show_help is referencing a tag
> >>>>>>> ("could-not-bind-to-socket") that does not exist in
> >>>>>>> help-odls-default.txt.
> >>>>>>> 
> >>>>>>> While it succeeds when run on nodes of type #2 or 3.
> >>>>>>> I think a "bind to socket" should not return an error on a single 
> >>>>>>> socket
> >>>>>>> machine, but rather be a noop.
> >>>>>>> 
> >>>>>>> The problem comes from the test
> >>>>>>> OPAL_PAFFINITY_PROCESS_IS_BOUND(mask, &bound);
> >>>>>>> called in odls_default_fork_local_proc() after the binding to the
> >>>>>>> processors socket has been done:
> >>>>>>> ========
> >>>>>>>   <snip>
> >>>>>>>   OPAL_PAFFINITY_CPU_ZERO(mask);
> >>>>>>>   for (n=0; n < orte_default_num_cores_per_socket; n++) {
> >>>>>>>       <snip>
> >>>>>>>       OPAL_PAFFINITY_CPU_SET(phys_cpu, mask);
> >>>>>>>   }
> >>>>>>>   /* if we did not bind it anywhere, then that is an error */
> >>>>>>>   OPAL_PAFFINITY_PROCESS_IS_BOUND(mask, &bound);
> >>>>>>>   if (!bound) {
> >>>>>>>       orte_show_help("help-odls-default.txt",
> >>>>>>>                      "odls-default:could-not-bind-to-socket", true);
> >>>>>>>       ORTE_ODLS_ERROR_OUT(ORTE_ERR_FATAL);
> >>>>>>>   }
> >>>>>>> ========
> >>>>>>> OPAL_PAFFINITY_PROCESS_IS_BOUND() will return true if there bits set 
> >>>>>>> in
> >>>>>>> the mask *AND* the number of bits set is lesser than the number of 
> >>>>>>> cpus
> >>>>>>> on the machine. Thus on a single socket, 4 cores machine the test will
> >>>>>>> fail. While on other the kinds of machines it will succeed.
> >>>>>>> 
> >>>>>>> Again, I think the problem could be solved by changing the alogrithm,
> >>>>>>> and assuming that ORTE_BIND_TO_SOCKET, on a single socket machine =
> >>>>>>> noop.
> >>>>>>> 
> >>>>>>> Another solution could be to call the test
> >>>>>>> OPAL_PAFFINITY_PROCESS_IS_BOUND() at the end of the loop only if we 
> >>>>>>> are
> >>>>>>> bound (orte_odls_globals.bound). Actually that is the only case where 
> >>>>>>> I
> >>>>>>> see a justification to this test (see attached patch).
> >>>>>>> 
> >>>>>>> And may be both solutions could be mixed.
> >>>>>>> 
> >>>>>>> Regards,
> >>>>>>> Nadia
> >>>>>>> 
> >>>>>>> 
> >>>>>>> -- 
> >>>>>>> Nadia Derbey <nadia.der...@bull.net>
> >>>>>>> <001_fix_process_binding_test.patch>_______________________________________________
> >>>>>>> devel mailing list
> >>>>>>> de...@open-mpi.org
> >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>>>> 
> >>>>>> _______________________________________________
> >>>>>> devel mailing list
> >>>>>> de...@open-mpi.org
> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>>> 
> >>>>>> 
> >>>> 
> >>>> 
> >>>> -- 
> >>>> <Mail Attachment.gif>
> >>>> Terry D. Dontje | Principal Software Engineer
> >>>> Developer Tools Engineering | +1.650.633.7054
> >>>> Oracle - Performance Technologies
> >>>> 95 Network Drive, Burlington, MA 01803
> >>>> Email terry.don...@oracle.com
> >>>> 
> >>>> 
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> de...@open-mpi.org
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>> 
> >>> 
> >>> 
> >>> ____________________________________________________________________
> >>> 
> >>> _______________________________________________
> >>> devel mailing list
> >>> de...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> 
> >> 
> >> -- 
> >> Oracle
> >> Terry D. Dontje | Principal Software Engineer
> >> Developer Tools Engineering | +1.650.633.7054
> >> Oracle - Performance Technologies
> >> 95 Network Drive, Burlington, MA 01803
> >> Email terry.don...@oracle.com
> >> 
> >> 
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > -- 
> > Nadia Derbey <nadia.der...@bull.net>
> > 
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
-- 
Nadia Derbey <nadia.der...@bull.net>

Reply via email to