On Tue, 2010-04-13 at 01:27 -0600, Ralph Castain wrote:
> On Apr 13, 2010, at 1:02 AM, Nadia Derbey wrote:
> 
> > On Mon, 2010-04-12 at 10:07 -0600, Ralph Castain wrote:
> >> By definition, if you bind to all available cpus in the OS, you are
> >> bound to nothing (i.e., "unbound") as your process runs on any
> >> available cpu.
> >> 
> >> 
> >> PLPA doesn't care, and I personally don't care. I was just explaining
> >> why it generates an error in the odls.
> >> 
> >> 
> >> A user app would detect its binding by (a) getting the affiinity mask
> >> from the OS, and then (b) seeing if the bits are set to '1' for all
> >> available processors. If it is, then you are not bound - there is no
> >> mechanism available for checking "are the bits set only for the
> >> processors I asked to be bound to". The OS doesn't track what you
> >> asked for, it only tracks where you are bound - and a mask with all
> >> '1's is defined as "unbound".
> >> 
> >> 
> >> So the reason for my question was simple: a user asked us to "bind"
> >> their process. If their process checks to see if it is bound, it will
> >> return "no". The user would therefore be led to believe that OMPI had
> >> failed to execute their request, when in fact we did execute it - but
> >> the result was (as Nadia says) a "no-op".
> >> 
> >> 
> >> After talking with Jeff, I think he has the right answer. It is a
> >> method we have used elsewhere, so it isn't unexpected behavior.
> >> Basically, he proposed that we use an mca param to control this
> >> behavior:
> >> 
> >> 
> >> * default: generate an error message as the "bind" results in a no-op,
> >> and this is our current behavior
> >> 
> >> 
> >> * warn: generate a warning that the binding wound up being a "no-op",
> >> but continue working
> >> 
> >> 
> >> * quiet: just ignore it and keep going
> > 
> > Excellent, I completely agree (though I would have put the 2nd star as
> > the default behavior, but never mind, I don't want to restart the
> > discussion ;-) )
> 
> I actually went back/forth on that as well - I personally think it might be 
> better to just have warn and quiet, with warn being the default. The warning 
> could be generated with orte_show_help so the messages would be consolidated 
> across nodes. Given that the enhanced paffinity behavior is fairly new, and 
> that no-one has previously raised this issue, I don't think the prior 
> behavior is relevant.
> 
> Would that make sense? If so, we could extend that to the other binding 
> options for consistency.

Sure!

Patch proposal attached.

Regards,
Nadia
> 
> > 
> > Also this is a good opportunity to fix the other issue I talked about in
> > the first message in this thread: the tag
> > "odls-default:could-not-bind-to-socket" does not exist in
> > orte/mca/odls/default/help-odls-default.txt
> 
> I'll take that one - my fault for missing it. I'll cross-check the other 
> messages as well. Thanks for catching it!
> 
> As for your other change: let me think on it. I -think- I understand your 
> logic, but honestly haven't had time to really walk through it properly. Got 
> an ORCM deadline to meet, but hope to break free towards the end of this week.
> 
> 
> > 
> > Regards,
> > Nadia
> >> 
> >> 
> >> Fairly trivial to implement, and Bull could set the default mca param
> >> file to "quiet" to get what they want. I'm not sure if that's what the
> >> community wants or not - like I said, it makes no diff to me so long
> >> as the code logic is understandable.
> >> 
> >> 
> >> 
> >> On Apr 12, 2010, at 8:27 AM, Terry Dontje wrote:
> >> 
> >>> Ralph, I guess I am curious why is it that if there is only one
> >>> socket we cannot bind to it?  Does plpa actually error on this or is
> >>> this a condition we decided was an error at odls?
> >>> 
> >>> I am somewhat torn on whether this makes sense.  On the one hand it
> >>> is definitely useless as to the result if you allow it.  However if
> >>> you don't allow it and you have a script or running tests on
> >>> multiple systems it would be nice to have this run because you are
> >>> not really running into a resource starvation issue.
> >>> 
> >>> At a minimum I think the error condition/message needs to be spelled
> >>> out (defined).    As to whether we allow binding when only one
> >>> socket exist I could go either way slightly leaning towards allowing
> >>> such a specification to work.
> >>> 
> >>> --td
> >>> 
> >>> 
> >>> Ralph Castain wrote: 
> >>>> Guess I'll jump in here as I finally had a few minutes to look at the 
> >>>> code and think about your original note. In fact, I believe your 
> >>>> original statement is the source of contention.
> >>>> 
> >>>> If someone tells us -bind-to-socket, but there is only one socket, then 
> >>>> we really cannot bind them to anything. Any check by their code would 
> >>>> reveal that they had not, in fact, been bound - raising questions as to 
> >>>> whether or not OMPI is performing the request. Our operating standard 
> >>>> has been to error out if the user specifies something we cannot do to 
> >>>> avoid that kind of confusion. This is what generated the code in the 
> >>>> system today.
> >>>> 
> >>>> Now I can see an argument that -bind-to-socket with one socket maybe 
> >>>> shouldn't generate an error, but that decision then has to get reflected 
> >>>> in other code areas as well.
> >>>> 
> >>>> As for the test you cite -  it actually performs a valuable function and 
> >>>> was added to catch specific scenarios. In particular, if you follow the 
> >>>> code flow up just a little, you will see that it is possible to complete 
> >>>> the loop without ever actually setting a bit in the mask. This happens 
> >>>> when none of the cpus in that socket have been assigned to us via an 
> >>>> external bind. People actually use that as a means of suballocating 
> >>>> nodes, so the test needs to be there. Again, if the user said "bind to 
> >>>> socket", but none of that socket's cores are assigned for our use, that 
> >>>> is an error.
> >>>> 
> >>>> I haven't looked at your specific fix, but I agree with Terry's 
> >>>> question. It seems to me that whether or not we were externally bound is 
> >>>> irrelevant. Even if the overall result is what you want, I think a more 
> >>>> logically understandable test would help others reading the code.
> >>>> 
> >>>> But first we need to resolve the question: should this scenario return 
> >>>> an error or not?
> >>>> 
> >>>> 
> >>>> On Apr 12, 2010, at 1:43 AM, Nadia Derbey wrote:
> >>>> 
> >>>> 
> >>>>> On Fri, 2010-04-09 at 14:23 -0400, Terry Dontje wrote:
> >>>>> 
> >>>>>> Ralph Castain wrote: 
> >>>>>> 
> >>>>>>> Okay, just wanted to ensure everyone was working from the same base
> >>>>>>> code. 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> Terry, Brad: you might want to look this proposed change over.
> >>>>>>> Something doesn't quite look right to me, but I haven't really
> >>>>>>> walked through the code to check it.
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>> At first blush I don't really get the usage of orte_odls_globals.bound
> >>>>>> in you patch.  It would seem to me that the insertion of that
> >>>>>> conditional would prevent the check it surrounds being done when the
> >>>>>> process has not been bounded prior to startup which is a common case.
> >>>>>> 
> >>>>> Well, if you have a look at the algo in the ORTE_BIND_TO_SOCKET path
> >>>>> (odls_default_fork_local_proc() in odls_default_module.c):
> >>>>> 
> >>>>> <set target_socket depending on the desired mapping>
> >>>>> <set my paffinity mask to 0>       (line 715)
> >>>>> <for each core in the socket> {
> >>>>>   <get the associated phys_core>
> >>>>>   <get the associated phys_cpu>
> >>>>>   <if we are bound (orte_odls_globals.bound)> {
> >>>>>       <if phys_cpu does not belong to the cpus I'm bound to>
> >>>>>           continue
> >>>>>   }
> >>>>>   <set phys-cpu bit in my affinity mask>
> >>>>> }
> >>>>> <check if something is set in my affinity mask>
> >>>>> ...
> >>>>> 
> >>>>> 
> >>>>> What I'm saying is that the only way to have nothing set in the affinity
> >>>>> mask (which would justify the last test) is to have never called the
> >>>>> <set phys_cpu in my affinity mask> instruction. This means:
> >>>>> . the test on orte_odls_globals.bound is true
> >>>>> . call <continue> for all the cores in the socket.
> >>>>> 
> >>>>> In the other path, what we are doing is checking if we have set one or
> >>>>> more bits in a mask after having actually set them: don't you think it's
> >>>>> useless?
> >>>>> 
> >>>>> That's why I'm suggesting to call the last check only if
> >>>>> orte_odls_globals.bound is true.
> >>>>> 
> >>>>> Regards,
> >>>>> Nadia
> >>>>> 
> >>>>>> --td
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>>> On Apr 9, 2010, at 9:33 AM, Terry Dontje wrote:
> >>>>>>> 
> >>>>>>> 
> >>>>>>>> Nadia Derbey wrote: 
> >>>>>>>> 
> >>>>>>>>> On Fri, 2010-04-09 at 08:41 -0600, Ralph Castain wrote:
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>>> Just to check: is this with the latest trunk? Brad and Terry have 
> >>>>>>>>>> been making changes to this section of code, including modifying 
> >>>>>>>>>> the PROCESS_IS_BOUND test...
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>> Well, it was on the v1.5. But I just checked: looks like
> >>>>>>>>> 1. the call to OPAL_PAFFINITY_PROCESS_IS_BOUND is still there in
> >>>>>>>>>    odls_default_fork_local_proc()
> >>>>>>>>> 2. OPAL_PAFFINITY_PROCESS_IS_BOUND() is defined the same way
> >>>>>>>>> 
> >>>>>>>>> But, I'll give it a try with the latest trunk.
> >>>>>>>>> 
> >>>>>>>>> Regards,
> >>>>>>>>> Nadia
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>> The changes, I've done do not touch
> >>>>>>>> OPAL_PAFFINITY_PROCESS_IS_BOUND at all.  Also, I am only touching
> >>>>>>>> code related to the "bind-to-core" option so I really doubt if my
> >>>>>>>> changes are causing issues here.
> >>>>>>>> 
> >>>>>>>> --td
> >>>>>>>> 
> >>>>>>>>>> On Apr 9, 2010, at 3:39 AM, Nadia Derbey wrote:
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>>> Hi,
> >>>>>>>>>>> 
> >>>>>>>>>>> I am facing a problem with a test that runs fine on some nodes, 
> >>>>>>>>>>> and
> >>>>>>>>>>> fails on others.
> >>>>>>>>>>> 
> >>>>>>>>>>> I have a heterogenous cluster, with 3 types of nodes:
> >>>>>>>>>>> 1) Single socket , 4 cores
> >>>>>>>>>>> 2) 2 sockets, 4cores per socket
> >>>>>>>>>>> 3) 2 sockets, 6 cores/socket
> >>>>>>>>>>> 
> >>>>>>>>>>> I am using:
> >>>>>>>>>>> . salloc to allocate the nodes,
> >>>>>>>>>>> . mpirun binding/mapping options "-bind-to-socket -bysocket"
> >>>>>>>>>>> 
> >>>>>>>>>>> # salloc -N 1 mpirun -n 4 -bind-to-socket -bysocket sleep 900
> >>>>>>>>>>> 
> >>>>>>>>>>> This command fails if the allocated node is of type #1 (single 
> >>>>>>>>>>> socket/4
> >>>>>>>>>>> cpus).
> >>>>>>>>>>> BTW, in that case orte_show_help is referencing a tag
> >>>>>>>>>>> ("could-not-bind-to-socket") that does not exist in
> >>>>>>>>>>> help-odls-default.txt.
> >>>>>>>>>>> 
> >>>>>>>>>>> While it succeeds when run on nodes of type #2 or 3.
> >>>>>>>>>>> I think a "bind to socket" should not return an error on a single 
> >>>>>>>>>>> socket
> >>>>>>>>>>> machine, but rather be a noop.
> >>>>>>>>>>> 
> >>>>>>>>>>> The problem comes from the test
> >>>>>>>>>>> OPAL_PAFFINITY_PROCESS_IS_BOUND(mask, &bound);
> >>>>>>>>>>> called in odls_default_fork_local_proc() after the binding to the
> >>>>>>>>>>> processors socket has been done:
> >>>>>>>>>>> ========
> >>>>>>>>>>>  <snip>
> >>>>>>>>>>>  OPAL_PAFFINITY_CPU_ZERO(mask);
> >>>>>>>>>>>  for (n=0; n < orte_default_num_cores_per_socket; n++) {
> >>>>>>>>>>>      <snip>
> >>>>>>>>>>>      OPAL_PAFFINITY_CPU_SET(phys_cpu, mask);
> >>>>>>>>>>>  }
> >>>>>>>>>>>  /* if we did not bind it anywhere, then that is an error */
> >>>>>>>>>>>  OPAL_PAFFINITY_PROCESS_IS_BOUND(mask, &bound);
> >>>>>>>>>>>  if (!bound) {
> >>>>>>>>>>>      orte_show_help("help-odls-default.txt",
> >>>>>>>>>>>                     "odls-default:could-not-bind-to-socket", 
> >>>>>>>>>>> true);
> >>>>>>>>>>>      ORTE_ODLS_ERROR_OUT(ORTE_ERR_FATAL);
> >>>>>>>>>>>  }
> >>>>>>>>>>> ========
> >>>>>>>>>>> OPAL_PAFFINITY_PROCESS_IS_BOUND() will return true if there bits 
> >>>>>>>>>>> set in
> >>>>>>>>>>> the mask *AND* the number of bits set is lesser than the number 
> >>>>>>>>>>> of cpus
> >>>>>>>>>>> on the machine. Thus on a single socket, 4 cores machine the test 
> >>>>>>>>>>> will
> >>>>>>>>>>> fail. While on other the kinds of machines it will succeed.
> >>>>>>>>>>> 
> >>>>>>>>>>> Again, I think the problem could be solved by changing the 
> >>>>>>>>>>> alogrithm,
> >>>>>>>>>>> and assuming that ORTE_BIND_TO_SOCKET, on a single socket machine 
> >>>>>>>>>>> =
> >>>>>>>>>>> noop.
> >>>>>>>>>>> 
> >>>>>>>>>>> Another solution could be to call the test
> >>>>>>>>>>> OPAL_PAFFINITY_PROCESS_IS_BOUND() at the end of the loop only if 
> >>>>>>>>>>> we are
> >>>>>>>>>>> bound (orte_odls_globals.bound). Actually that is the only case 
> >>>>>>>>>>> where I
> >>>>>>>>>>> see a justification to this test (see attached patch).
> >>>>>>>>>>> 
> >>>>>>>>>>> And may be both solutions could be mixed.
> >>>>>>>>>>> 
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Nadia
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> -- 
> >>>>>>>>>>> Nadia Derbey <nadia.der...@bull.net>
> >>>>>>>>>>> <001_fix_process_binding_test.patch>_______________________________________________
> >>>>>>>>>>> devel mailing list
> >>>>>>>>>>> de...@open-mpi.org
> >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> devel mailing list
> >>>>>>>>>> de...@open-mpi.org
> >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>> -- 
> >>>>>>>> <Mail Attachment.gif>
> >>>>>>>> Terry D. Dontje | Principal Software Engineer
> >>>>>>>> Developer Tools Engineering | +1.650.633.7054
> >>>>>>>> Oracle - Performance Technologies
> >>>>>>>> 95 Network Drive, Burlington, MA 01803
> >>>>>>>> Email terry.don...@oracle.com
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> _______________________________________________
> >>>>>>>> devel mailing list
> >>>>>>>> de...@open-mpi.org
> >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>>>>> 
> >>>>>>> ____________________________________________________________________
> >>>>>>> 
> >>>>>>> _______________________________________________
> >>>>>>> devel mailing list
> >>>>>>> de...@open-mpi.org
> >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>>>> 
> >>>>>> -- 
> >>>>>> Oracle
> >>>>>> Terry D. Dontje | Principal Software Engineer
> >>>>>> Developer Tools Engineering | +1.650.633.7054
> >>>>>> Oracle - Performance Technologies
> >>>>>> 95 Network Drive, Burlington, MA 01803
> >>>>>> Email terry.don...@oracle.com
> >>>>>> 
> >>>>>> 
> >>>>>> _______________________________________________
> >>>>>> devel mailing list
> >>>>>> de...@open-mpi.org
> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>>> 
> >>>>> -- 
> >>>>> Nadia Derbey <nadia.der...@bull.net>
> >>>>> 
> >>>>> _______________________________________________
> >>>>> devel mailing list
> >>>>> de...@open-mpi.org
> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>> 
> >>>> 
> >>>> 
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> de...@open-mpi.org
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>> 
> >>> 
> >>> 
> >>> -- 
> >>> <Mail Attachment.gif>
> >>> Terry D. Dontje | Principal Software Engineer
> >>> Developer Tools Engineering | +1.650.633.7054
> >>> Oracle - Performance Technologies
> >>> 95 Network Drive, Burlington, MA 01803
> >>> Email terry.don...@oracle.com
> >>> 
> >>> 
> >>> _______________________________________________
> >>> devel mailing list
> >>> de...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> 
> >> 
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > -- 
> > Nadia Derbey <nadia.der...@bull.net>
> > 
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
-- 
Nadia Derbey <nadia.der...@bull.net>
when bind-to-socket is asked for, do not unconditionally leave if we are running on a single socket node

diff -r 0b851b2e7934 orte/mca/odls/default/help-odls-default.txt
--- a/orte/mca/odls/default/help-odls-default.txt	Thu Mar 18 16:10:25 2010 +0100
+++ b/orte/mca/odls/default/help-odls-default.txt	Tue Apr 13 13:40:12 2010 +0200
@@ -130,3 +130,13 @@ binding action:
   Application name:  %s

 Please revise the request and try again.
+#
+[odls-default:warn-not-bound-to-socket]
+A request to bind the processes to a socket was made, but the local host
+only contains a single socket.
+This will result in the processes being unbound.
+Continuing anyway.
+
+  Local host:        %s
+  Action requested:  %s
+  Application name:  %s
diff -r 0b851b2e7934 orte/mca/odls/default/odls_default.h
--- a/orte/mca/odls/default/odls_default.h	Thu Mar 18 16:10:25 2010 +0100
+++ b/orte/mca/odls/default/odls_default.h	Tue Apr 13 13:40:12 2010 +0200
@@ -36,6 +36,7 @@ BEGIN_C_DECLS
 int orte_odls_default_component_open(void);
 int orte_odls_default_component_close(void);
 int orte_odls_default_component_query(mca_base_module_t **module, int *priority);
+int orte_odls_default_component_register(void);

 /*
  * ODLS Default module
@@ -46,6 +47,8 @@ ORTE_MODULE_DECLSPEC extern orte_odls_ba
 /* dedicated debug output flag */
 ORTE_MODULE_DECLSPEC extern bool orte_odls_default_report_bindings;

+ORTE_DECLSPEC extern bool orte_odls_default_warn_if_not_bound;
+
 END_C_DECLS

 #endif /* ORTE_ODLS_H */
diff -r 0b851b2e7934 orte/mca/odls/default/odls_default_component.c
--- a/orte/mca/odls/default/odls_default_component.c	Thu Mar 18 16:10:25 2010 +0100
+++ b/orte/mca/odls/default/odls_default_component.c	Tue Apr 13 13:40:12 2010 +0200
@@ -31,12 +31,17 @@
 #endif
 #include <ctype.h>

+#include "opal/mca/mca.h"
+#include "opal/mca/base/base.h"
+#include "opal/mca/base/mca_base_param.h"
+
 #include "orte/mca/odls/odls.h"
 #include "orte/mca/odls/base/odls_private.h"
 #include "orte/mca/odls/default/odls_default.h"

 /* instantiate a module-global variable */
 bool orte_odls_default_report_bindings;
+bool orte_odls_default_warn_if_not_bound;

 /*
  * Instantiate the public struct with all of our public information
@@ -57,7 +62,8 @@ orte_odls_base_component_t mca_odls_defa
         /* Component open and close functions */
         orte_odls_default_component_open,
         orte_odls_default_component_close,
-        orte_odls_default_component_query
+        orte_odls_default_component_query,
+        orte_odls_default_component_register
     },
     {
         /* The component is checkpoint ready */
@@ -72,6 +78,17 @@ int orte_odls_default_component_open(voi
     return ORTE_SUCCESS;
 }

+int orte_odls_default_component_register(void)
+{
+    mca_base_param_reg_int(&mca_odls_default_component.version,
+                           "warn_if_not_bound",
+                           "If nonzero, issue a warning if the program asked "
+                           "for a binding that results in a no-op (ex: "
+                           "bind-to-socket on a single socket node)",
+                           false, false, 1,
+                           &orte_odls_default_warn_if_not_bound);
+    return ORTE_SUCCESS;
+}

 int orte_odls_default_component_query(mca_base_module_t **module, int *priority)
 {
diff -r 0b851b2e7934 orte/mca/odls/default/odls_default_module.c
--- a/orte/mca/odls/default/odls_default_module.c	Thu Mar 18 16:10:25 2010 +0100
+++ b/orte/mca/odls/default/odls_default_module.c	Tue Apr 13 13:40:12 2010 +0200
@@ -750,9 +750,19 @@ static int odls_default_fork_local_proc(
                 /* if we did not bind it anywhere, then that is an error */
                 OPAL_PAFFINITY_PROCESS_IS_BOUND(mask, &bound);
                 if (!bound) {
-                    orte_show_help("help-odls-default.txt",
-                                   "odls-default:could-not-bind-to-socket", true);
-                    ORTE_ODLS_ERROR_OUT(ORTE_ERR_FATAL);
+                    if (orte_odls_globals.bound) {
+                        orte_show_help("help-odls-default.txt",
+                                       "odls-default:could-not-bind-to-socket", true);
+                        ORTE_ODLS_ERROR_OUT(ORTE_ERR_FATAL);
+                    } else {
+                        if (orte_odls_default_warn_if_not_bound) {
+                            orte_show_help("help-odls-default.txt",
+                                           "odls-default:warn-not-bound-to-socket",
+                                           true,
+                                           orte_process_info.nodename,
+                                           "bind-to-core", context->app);
+                        }
+                    }
                 }
                 if (orte_report_bindings) {
                     opal_output(0, "%s odls:default:fork binding child %s to socket %d cpus %04lx",

Reply via email to