No worries. Thanks for the adjustment.

Cheers,
Josh

On Mar 26, 2008, at 3:10 PM, Jeff Squyres wrote:
Sorry about that, Josh -- thanks for fixing it.

I added one more very minor change on top of r17980.


On Mar 26, 2008, at 10:55 AM, Josh Hursey wrote:
My fix is in r17980. I did some limited testing with and without C/R
and things look fine. Wider testing may be in order, but I think MTT
should take care of that this evening.

Cheers,
Josh

On Mar 26, 2008, at 10:40 AM, Josh Hursey wrote:

Jeff,

I think this commit is not quite correct. I'm working on a patch to
fix it at the moment, but just wanted to give a heads up for anyone
that is experience the same problem I am.

Before this commit I could set "opal_event_include=select" in
my .openmpi/mca-params.conf file and the event engine would only use
'select' for all OMPI/ORTE processes. This commit overrides this
selection by forcing that all MPI apps use "all". I noticed the break
since the FT builds (which require 'select' at the moment) were
failing.

The fix might be as easy as checking to see if the user specified
anything other than the default then forcing only if the user did not
define anything. Thoughts?

-- Josh

On Mar 25, 2008, at 1:18 PM, jsquy...@osl.iu.edu wrote:

Author: jsquyres
Date: 2008-03-25 13:18:17 EDT (Tue, 25 Mar 2008)
New Revision: 17956
URL: https://svn.open-mpi.org/trac/ompi/changeset/17956

Log:
Fix #1253: default libevent to use select/poll and only use the
other
mechanisms (such as epoll) if someone (ompi_mpi_init()) requests
otherwise.  See big comment in opal/event/event.c for a full
explanation.

Text files modified:
trunk/ompi/runtime/ompi_mpi_init.c |    32 ++++++++++++++++++++++++
+++-----
trunk/opal/event/event.c           |    37 ++++++++++++++++++++++++
++++++++-----
trunk/orte/orted/orted_main.c      |    31
-------------------------------
3 files changed, 59 insertions(+), 41 deletions(-)

Modified: trunk/ompi/runtime/ompi_mpi_init.c
=
=
=
=
=
=
=
=
=
=
=================================================================== =
--- trunk/ompi/runtime/ompi_mpi_init.c  (original)
+++ trunk/ompi/runtime/ompi_mpi_init.c 2008-03-25 13:18:17 EDT (Tue,
25 Mar 2008)
@@ -234,15 +234,37 @@
  /* see comment below about sched_yield */
  int num_processors;
#endif
-
- /* Join the run-time environment - do the things that don't hit
-       the registry */

-    if (ORTE_SUCCESS != (ret = opal_init())) {
-        error = "ompi_mpi_init: opal_init failed";
+    /* Setup enough to check get/set MCA params */
+
+    if (ORTE_SUCCESS != (ret = opal_init_util())) {
+        error = "ompi_mpi_init: opal_init_util failed";
      goto error;
  }

+    /* _After_ opal_init_util() but _before_ orte_init(), we need
to
+ set an MCA param that tells libevent that it's ok to use any
+       mechanism in libevent that si available on this platform
(e.g.,
+       epoll and friends).  Per opal/event/event.s, we default to
+ select/poll -- but we know that MPI processes won't be using
+       pty's with the event engine, so it's ok to relax this
+       constraint and let any fd-monitoring mechanism be used. */
+    ret = mca_base_param_reg_string_name("opal", "event_include",
+ "Internal orted MCA param:
tell opal_init() to use a specific mechanism in libevent",
+                                         false, false, "all",
NULL);
+    if (ret >= 0) {
+        /* We have to explicitly "set" the MCA param value here
+ because libevent initialization will re-register the MCA
+           param and therefore override the default. Setting the
value
+ here puts the desired value ("all") in different storage
+           that is not overwritten if/when the MCA param is
+           re-registered.  Note that we do *NOT* set this value as
an
+ environment variable, just so that it won't be inherited
by
+           any spawned processes and potentially cause unintented
+           side-effects with launching ORTE tools... */
+        mca_base_param_set_string(ret, "all");
+    }
+
  /* check to see if we want timing information */
  param = mca_base_param_reg_int_name("ompi", "timing",
                                      "Request that critical
timing loops be measured",

Modified: trunk/opal/event/event.c
=
=
=
=
=
=
=
=
=
=
=================================================================== =
--- trunk/opal/event/event.c    (original)
+++ trunk/opal/event/event.c    2008-03-25 13:18:17 EDT (Tue, 25 Mar
2008)
@@ -256,15 +256,42 @@

#if OPAL_HAVE_WORKING_EVENTOPS

-    /**
-     * Retrieve the upper level specified event system, if any.
+    /* Retrieve the upper level specified event system, if any.
+     * Default to select() on OS X and poll() everywhere else
because
+ * various parts of OMPI / ORTE use libevent with pty's. pty's
+     * *only* work with select on OS X (tested on Tiger and
Leopard);
+     * we *know* that both select and poll works with pty's
everywhere
+     * else we care about (other mechansisms such as epoll *may*
work
+     * with pty's -- we have not tested comprehensively with newer
+     * versions of Linux, etc.).  So the safe thing to do is:
+     *
+     * - On OS X, default to using "select" only
+     * - Everywhere else, default to using "poll" only (because
poll
+     *   is more scalable than select)
+     *
+     * An upper layer may override this setting if it knows that
pty's
+ * won't be used with libevent. For example, we currently have
+     * ompi_mpi_init() set to use "all" (to include epoll and
friends)
+ * so that the TCP BTL can be a bit more scalable -- because we
+     * *know* that MPI apps don't use pty's with libevent.
+     * Note that other tools explicitly *do* use pty's with
libevent:
+     *
+     * - orted
+     * - orterun (probably only if it launches locally)
+     * - ...?
   */
  mca_base_param_reg_string_name("opal", "event_include",
-                                   "Comma-delimited list of
libevent subsystems to use (kqueue, devpoll, epoll, poll, select,
and rtsig)",
-                                   false, false, "all",
&event_module_include);
+                                   "Comma-delimited list of
libevent subsystems to use (kqueue, devpoll, epoll, poll, select,
and rtsig -- depending on your platform)",
+                                   false, false,
+#ifdef __APPLE__
+                                   "select",
+#else
+                                   "poll",
+#endif
+                                   &event_module_include);
  if (NULL == event_module_include) {
      /* Shouldn't happen, but... */
-        event_module_include = strdup("all");
+        event_module_include = strdup("select");
  }
  opal_event_module_include =
opal_argv_split(event_module_include,',');
  free(event_module_include);

Modified: trunk/orte/orted/orted_main.c
=
=
=
=
=
=
=
=
=
=
=================================================================== =
--- trunk/orte/orted/orted_main.c       (original)
+++ trunk/orte/orted/orted_main.c       2008-03-25 13:18:17 EDT (Tue, 25
Mar 2008)
@@ -178,7 +178,6 @@
  char log_file[PATH_MAX];
  char *jobidstring;
  char *rml_uri;
-    char *tmp1, *tmp2;
  int i;
  opal_buffer_t *buffer;
  char hostname[100];
@@ -264,36 +263,6 @@
      if (1000 < i) i=0;
  }

- /* _After_ opal_init_util() (and various other bookkeeping) but - _before_ orte_init(), we need to set an MCA param that tells
-       the orted not to use any other libevent mechanism except
-       "select" or "poll" (per potential pty issues with scalable
-       fd-monitoring mechanisms such as epoll() and friends --
these
-       issues *may* have been fixed in later OS releases and/or
newer
-       versions of libevent, but we weren't willing to do all the
-       testing to figure it out.  So force the orted to use
-       select()/poll() *only* -- there's so few fd's in the orted
that
-       it really doesn't matter.
-
- Note that pty's work fine with poll() on most systems, so we
-       prefer that (because it's more scalable than select()).
- However, poll() does *not* work with ptys on OS X, so we use
-       select() there. */
-    mca_base_param_reg_string_name("opal", "event_include",
-                                   "Internal orted MCA param: tell
opal_init() to use a specific mechanism in libevent",
-                                   true, true,
-#ifdef __APPLE__
-                                   "select",
-#else
-                                   "poll",
-#endif
-                                   NULL);
-    tmp1 = mca_base_param_environ_variable("opal", NULL,
"event_include");
-    asprintf(&tmp2, "%s=select", tmp1);
-    putenv(tmp2);
-    free(tmp1);
-    free(tmp2);
-
  /* Okay, now on to serious business! */

  if (orted_globals.hnp) {
_______________________________________________
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to