On Mon, 2010-04-12 at 10:07 -0600, Ralph Castain wrote:
> By definition, if you bind to all available cpus in the OS, you are
> bound to nothing (i.e., "unbound") as your process runs on any
> available cpu.
> 
> 
> PLPA doesn't care, and I personally don't care. I was just explaining
> why it generates an error in the odls.
> 
> 
> A user app would detect its binding by (a) getting the affiinity mask
> from the OS, and then (b) seeing if the bits are set to '1' for all
> available processors. If it is, then you are not bound - there is no
> mechanism available for checking "are the bits set only for the
> processors I asked to be bound to". The OS doesn't track what you
> asked for, it only tracks where you are bound - and a mask with all
> '1's is defined as "unbound".
> 
> 
> So the reason for my question was simple: a user asked us to "bind"
> their process. If their process checks to see if it is bound, it will
> return "no". The user would therefore be led to believe that OMPI had
> failed to execute their request, when in fact we did execute it - but
> the result was (as Nadia says) a "no-op".
> 
> 
> After talking with Jeff, I think he has the right answer. It is a
> method we have used elsewhere, so it isn't unexpected behavior.
> Basically, he proposed that we use an mca param to control this
> behavior:
> 
> 
> * default: generate an error message as the "bind" results in a no-op,
> and this is our current behavior
> 
> 
> * warn: generate a warning that the binding wound up being a "no-op",
> but continue working
> 
> 
> * quiet: just ignore it and keep going

Excellent, I completely agree (though I would have put the 2nd star as
the default behavior, but never mind, I don't want to restart the
discussion ;-) )

Also this is a good opportunity to fix the other issue I talked about in
the first message in this thread: the tag
"odls-default:could-not-bind-to-socket" does not exist in
orte/mca/odls/default/help-odls-default.txt

Regards,
Nadia
> 
> 
> Fairly trivial to implement, and Bull could set the default mca param
> file to "quiet" to get what they want. I'm not sure if that's what the
> community wants or not - like I said, it makes no diff to me so long
> as the code logic is understandable.
> 
> 
> 
> On Apr 12, 2010, at 8:27 AM, Terry Dontje wrote:
> 
> > Ralph, I guess I am curious why is it that if there is only one
> > socket we cannot bind to it?  Does plpa actually error on this or is
> > this a condition we decided was an error at odls?
> > 
> > I am somewhat torn on whether this makes sense.  On the one hand it
> > is definitely useless as to the result if you allow it.  However if
> > you don't allow it and you have a script or running tests on
> > multiple systems it would be nice to have this run because you are
> > not really running into a resource starvation issue.
> > 
> > At a minimum I think the error condition/message needs to be spelled
> > out (defined).    As to whether we allow binding when only one
> > socket exist I could go either way slightly leaning towards allowing
> > such a specification to work.
> > 
> > --td
> > 
> > 
> > Ralph Castain wrote: 
> > > Guess I'll jump in here as I finally had a few minutes to look at the 
> > > code and think about your original note. In fact, I believe your original 
> > > statement is the source of contention.
> > > 
> > > If someone tells us -bind-to-socket, but there is only one socket, then 
> > > we really cannot bind them to anything. Any check by their code would 
> > > reveal that they had not, in fact, been bound - raising questions as to 
> > > whether or not OMPI is performing the request. Our operating standard has 
> > > been to error out if the user specifies something we cannot do to avoid 
> > > that kind of confusion. This is what generated the code in the system 
> > > today.
> > > 
> > > Now I can see an argument that -bind-to-socket with one socket maybe 
> > > shouldn't generate an error, but that decision then has to get reflected 
> > > in other code areas as well.
> > > 
> > > As for the test you cite -  it actually performs a valuable function and 
> > > was added to catch specific scenarios. In particular, if you follow the 
> > > code flow up just a little, you will see that it is possible to complete 
> > > the loop without ever actually setting a bit in the mask. This happens 
> > > when none of the cpus in that socket have been assigned to us via an 
> > > external bind. People actually use that as a means of suballocating 
> > > nodes, so the test needs to be there. Again, if the user said "bind to 
> > > socket", but none of that socket's cores are assigned for our use, that 
> > > is an error.
> > > 
> > > I haven't looked at your specific fix, but I agree with Terry's question. 
> > > It seems to me that whether or not we were externally bound is 
> > > irrelevant. Even if the overall result is what you want, I think a more 
> > > logically understandable test would help others reading the code.
> > > 
> > > But first we need to resolve the question: should this scenario return an 
> > > error or not?
> > > 
> > > 
> > > On Apr 12, 2010, at 1:43 AM, Nadia Derbey wrote:
> > > 
> > >   
> > > > On Fri, 2010-04-09 at 14:23 -0400, Terry Dontje wrote:
> > > >     
> > > > > Ralph Castain wrote: 
> > > > >       
> > > > > > Okay, just wanted to ensure everyone was working from the same base
> > > > > > code. 
> > > > > > 
> > > > > > 
> > > > > > Terry, Brad: you might want to look this proposed change over.
> > > > > > Something doesn't quite look right to me, but I haven't really
> > > > > > walked through the code to check it.
> > > > > > 
> > > > > > 
> > > > > >         
> > > > > At first blush I don't really get the usage of orte_odls_globals.bound
> > > > > in you patch.  It would seem to me that the insertion of that
> > > > > conditional would prevent the check it surrounds being done when the
> > > > > process has not been bounded prior to startup which is a common case.
> > > > >       
> > > > Well, if you have a look at the algo in the ORTE_BIND_TO_SOCKET path
> > > > (odls_default_fork_local_proc() in odls_default_module.c):
> > > > 
> > > > <set target_socket depending on the desired mapping>
> > > > <set my paffinity mask to 0>       (line 715)
> > > > <for each core in the socket> {
> > > >    <get the associated phys_core>
> > > >    <get the associated phys_cpu>
> > > >    <if we are bound (orte_odls_globals.bound)> {
> > > >        <if phys_cpu does not belong to the cpus I'm bound to>
> > > >            continue
> > > >    }
> > > >    <set phys-cpu bit in my affinity mask>
> > > > }
> > > > <check if something is set in my affinity mask>
> > > > ...
> > > > 
> > > > 
> > > > What I'm saying is that the only way to have nothing set in the affinity
> > > > mask (which would justify the last test) is to have never called the
> > > > <set phys_cpu in my affinity mask> instruction. This means:
> > > >  . the test on orte_odls_globals.bound is true
> > > >  . call <continue> for all the cores in the socket.
> > > > 
> > > > In the other path, what we are doing is checking if we have set one or
> > > > more bits in a mask after having actually set them: don't you think it's
> > > > useless?
> > > > 
> > > > That's why I'm suggesting to call the last check only if
> > > > orte_odls_globals.bound is true.
> > > > 
> > > > Regards,
> > > > Nadia
> > > >     
> > > > > --td
> > > > > 
> > > > > 
> > > > >       
> > > > > > On Apr 9, 2010, at 9:33 AM, Terry Dontje wrote:
> > > > > > 
> > > > > >         
> > > > > > > Nadia Derbey wrote: 
> > > > > > >           
> > > > > > > > On Fri, 2010-04-09 at 08:41 -0600, Ralph Castain wrote:
> > > > > > > > 
> > > > > > > >             
> > > > > > > > > Just to check: is this with the latest trunk? Brad and Terry 
> > > > > > > > > have been making changes to this section of code, including 
> > > > > > > > > modifying the PROCESS_IS_BOUND test...
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >               
> > > > > > > > Well, it was on the v1.5. But I just checked: looks like
> > > > > > > >  1. the call to OPAL_PAFFINITY_PROCESS_IS_BOUND is still there 
> > > > > > > > in
> > > > > > > >     odls_default_fork_local_proc()
> > > > > > > >  2. OPAL_PAFFINITY_PROCESS_IS_BOUND() is defined the same way
> > > > > > > > 
> > > > > > > > But, I'll give it a try with the latest trunk.
> > > > > > > > 
> > > > > > > > Regards,
> > > > > > > > Nadia
> > > > > > > > 
> > > > > > > > 
> > > > > > > >             
> > > > > > > The changes, I've done do not touch
> > > > > > > OPAL_PAFFINITY_PROCESS_IS_BOUND at all.  Also, I am only touching
> > > > > > > code related to the "bind-to-core" option so I really doubt if my
> > > > > > > changes are causing issues here.
> > > > > > > 
> > > > > > > --td
> > > > > > >           
> > > > > > > > > On Apr 9, 2010, at 3:39 AM, Nadia Derbey wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >               
> > > > > > > > > > Hi,
> > > > > > > > > > 
> > > > > > > > > > I am facing a problem with a test that runs fine on some 
> > > > > > > > > > nodes, and
> > > > > > > > > > fails on others.
> > > > > > > > > > 
> > > > > > > > > > I have a heterogenous cluster, with 3 types of nodes:
> > > > > > > > > > 1) Single socket , 4 cores
> > > > > > > > > > 2) 2 sockets, 4cores per socket
> > > > > > > > > > 3) 2 sockets, 6 cores/socket
> > > > > > > > > > 
> > > > > > > > > > I am using:
> > > > > > > > > > . salloc to allocate the nodes,
> > > > > > > > > > . mpirun binding/mapping options "-bind-to-socket -bysocket"
> > > > > > > > > > 
> > > > > > > > > > # salloc -N 1 mpirun -n 4 -bind-to-socket -bysocket sleep 
> > > > > > > > > > 900
> > > > > > > > > > 
> > > > > > > > > > This command fails if the allocated node is of type #1 
> > > > > > > > > > (single socket/4
> > > > > > > > > > cpus).
> > > > > > > > > > BTW, in that case orte_show_help is referencing a tag
> > > > > > > > > > ("could-not-bind-to-socket") that does not exist in
> > > > > > > > > > help-odls-default.txt.
> > > > > > > > > > 
> > > > > > > > > > While it succeeds when run on nodes of type #2 or 3.
> > > > > > > > > > I think a "bind to socket" should not return an error on a 
> > > > > > > > > > single socket
> > > > > > > > > > machine, but rather be a noop.
> > > > > > > > > > 
> > > > > > > > > > The problem comes from the test
> > > > > > > > > > OPAL_PAFFINITY_PROCESS_IS_BOUND(mask, &bound);
> > > > > > > > > > called in odls_default_fork_local_proc() after the binding 
> > > > > > > > > > to the
> > > > > > > > > > processors socket has been done:
> > > > > > > > > > ========
> > > > > > > > > >   <snip>
> > > > > > > > > >   OPAL_PAFFINITY_CPU_ZERO(mask);
> > > > > > > > > >   for (n=0; n < orte_default_num_cores_per_socket; n++) {
> > > > > > > > > >       <snip>
> > > > > > > > > >       OPAL_PAFFINITY_CPU_SET(phys_cpu, mask);
> > > > > > > > > >   }
> > > > > > > > > >   /* if we did not bind it anywhere, then that is an error 
> > > > > > > > > > */
> > > > > > > > > >   OPAL_PAFFINITY_PROCESS_IS_BOUND(mask, &bound);
> > > > > > > > > >   if (!bound) {
> > > > > > > > > >       orte_show_help("help-odls-default.txt",
> > > > > > > > > >                      
> > > > > > > > > > "odls-default:could-not-bind-to-socket", true);
> > > > > > > > > >       ORTE_ODLS_ERROR_OUT(ORTE_ERR_FATAL);
> > > > > > > > > >   }
> > > > > > > > > > ========
> > > > > > > > > > OPAL_PAFFINITY_PROCESS_IS_BOUND() will return true if there 
> > > > > > > > > > bits set in
> > > > > > > > > > the mask *AND* the number of bits set is lesser than the 
> > > > > > > > > > number of cpus
> > > > > > > > > > on the machine. Thus on a single socket, 4 cores machine 
> > > > > > > > > > the test will
> > > > > > > > > > fail. While on other the kinds of machines it will succeed.
> > > > > > > > > > 
> > > > > > > > > > Again, I think the problem could be solved by changing the 
> > > > > > > > > > alogrithm,
> > > > > > > > > > and assuming that ORTE_BIND_TO_SOCKET, on a single socket 
> > > > > > > > > > machine =
> > > > > > > > > > noop.
> > > > > > > > > > 
> > > > > > > > > > Another solution could be to call the test
> > > > > > > > > > OPAL_PAFFINITY_PROCESS_IS_BOUND() at the end of the loop 
> > > > > > > > > > only if we are
> > > > > > > > > > bound (orte_odls_globals.bound). Actually that is the only 
> > > > > > > > > > case where I
> > > > > > > > > > see a justification to this test (see attached patch).
> > > > > > > > > > 
> > > > > > > > > > And may be both solutions could be mixed.
> > > > > > > > > > 
> > > > > > > > > > Regards,
> > > > > > > > > > Nadia
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > -- 
> > > > > > > > > > Nadia Derbey <nadia.der...@bull.net>
> > > > > > > > > > <001_fix_process_binding_test.patch>_______________________________________________
> > > > > > > > > > devel mailing list
> > > > > > > > > > de...@open-mpi.org
> > > > > > > > > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > > > > > > > > 
> > > > > > > > > >                 
> > > > > > > > > _______________________________________________
> > > > > > > > > devel mailing list
> > > > > > > > > de...@open-mpi.org
> > > > > > > > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >               
> > > > > > > -- 
> > > > > > > <Mail Attachment.gif>
> > > > > > > Terry D. Dontje | Principal Software Engineer
> > > > > > > Developer Tools Engineering | +1.650.633.7054
> > > > > > > Oracle - Performance Technologies
> > > > > > > 95 Network Drive, Burlington, MA 01803
> > > > > > > Email terry.don...@oracle.com
> > > > > > > 
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > devel mailing list
> > > > > > > de...@open-mpi.org
> > > > > > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > > > > >           
> > > > > > ____________________________________________________________________
> > > > > > 
> > > > > > _______________________________________________
> > > > > > devel mailing list
> > > > > > de...@open-mpi.org
> > > > > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > > > >         
> > > > > -- 
> > > > > Oracle
> > > > > Terry D. Dontje | Principal Software Engineer
> > > > > Developer Tools Engineering | +1.650.633.7054
> > > > > Oracle - Performance Technologies
> > > > > 95 Network Drive, Burlington, MA 01803
> > > > > Email terry.don...@oracle.com
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > devel mailing list
> > > > > de...@open-mpi.org
> > > > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > > >       
> > > > -- 
> > > > Nadia Derbey <nadia.der...@bull.net>
> > > > 
> > > > _______________________________________________
> > > > devel mailing list
> > > > de...@open-mpi.org
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > >     
> > > 
> > > 
> > > _______________________________________________
> > > devel mailing list
> > > de...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >   
> > 
> > 
> > -- 
> > <Mail Attachment.gif>
> > Terry D. Dontje | Principal Software Engineer
> > Developer Tools Engineering | +1.650.633.7054
> > Oracle - Performance Technologies
> > 95 Network Drive, Burlington, MA 01803
> > Email terry.don...@oracle.com
> > 
> > 
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
-- 
Nadia Derbey <nadia.der...@bull.net>

Reply via email to