Jeff,

I think this commit is not quite correct. I'm working on a patch to fix it at the moment, but just wanted to give a heads up for anyone that is experience the same problem I am.

Before this commit I could set "opal_event_include=select" in my .openmpi/mca-params.conf file and the event engine would only use 'select' for all OMPI/ORTE processes. This commit overrides this selection by forcing that all MPI apps use "all". I noticed the break since the FT builds (which require 'select' at the moment) were failing.

The fix might be as easy as checking to see if the user specified anything other than the default then forcing only if the user did not define anything. Thoughts?

-- Josh

On Mar 25, 2008, at 1:18 PM, jsquy...@osl.iu.edu wrote:

Author: jsquyres
Date: 2008-03-25 13:18:17 EDT (Tue, 25 Mar 2008)
New Revision: 17956
URL: https://svn.open-mpi.org/trac/ompi/changeset/17956

Log:
Fix #1253: default libevent to use select/poll and only use the other
mechanisms (such as epoll) if someone (ompi_mpi_init()) requests
otherwise.  See big comment in opal/event/event.c for a full
explanation.

Text files modified:
trunk/ompi/runtime/ompi_mpi_init.c | 32 ++++++++++++++++++++++++ +++----- trunk/opal/event/event.c | 37 ++++++++++++++++++++++++ ++++++++----- trunk/orte/orted/orted_main.c | 31 -------------------------------
  3 files changed, 59 insertions(+), 41 deletions(-)

Modified: trunk/ompi/runtime/ompi_mpi_init.c
= = = = = = = = ======================================================================
--- trunk/ompi/runtime/ompi_mpi_init.c  (original)
+++ trunk/ompi/runtime/ompi_mpi_init.c 2008-03-25 13:18:17 EDT (Tue, 25 Mar 2008)
@@ -234,15 +234,37 @@
    /* see comment below about sched_yield */
    int num_processors;
#endif
-
-    /* Join the run-time environment - do the things that don't hit
-       the registry */

-    if (ORTE_SUCCESS != (ret = opal_init())) {
-        error = "ompi_mpi_init: opal_init failed";
+    /* Setup enough to check get/set MCA params */
+
+    if (ORTE_SUCCESS != (ret = opal_init_util())) {
+        error = "ompi_mpi_init: opal_init_util failed";
        goto error;
    }

+    /* _After_ opal_init_util() but _before_ orte_init(), we need to
+       set an MCA param that tells libevent that it's ok to use any
+ mechanism in libevent that si available on this platform (e.g.,
+       epoll and friends).  Per opal/event/event.s, we default to
+       select/poll -- but we know that MPI processes won't be using
+       pty's with the event engine, so it's ok to relax this
+       constraint and let any fd-monitoring mechanism be used. */
+    ret = mca_base_param_reg_string_name("opal", "event_include",
+ "Internal orted MCA param: tell opal_init() to use a specific mechanism in libevent",
+                                         false, false, "all", NULL);
+    if (ret >= 0) {
+        /* We have to explicitly "set" the MCA param value here
+           because libevent initialization will re-register the MCA
+ param and therefore override the default. Setting the value
+           here puts the desired value ("all") in different storage
+           that is not overwritten if/when the MCA param is
+           re-registered.  Note that we do *NOT* set this value as an
+ environment variable, just so that it won't be inherited by
+           any spawned processes and potentially cause unintented
+           side-effects with launching ORTE tools... */
+        mca_base_param_set_string(ret, "all");
+    }
+
    /* check to see if we want timing information */
    param = mca_base_param_reg_int_name("ompi", "timing",
"Request that critical timing loops be measured",

Modified: trunk/opal/event/event.c
= = = = = = = = ======================================================================
--- trunk/opal/event/event.c    (original)
+++ trunk/opal/event/event.c 2008-03-25 13:18:17 EDT (Tue, 25 Mar 2008)
@@ -256,15 +256,42 @@

#if OPAL_HAVE_WORKING_EVENTOPS

-    /**
-     * Retrieve the upper level specified event system, if any.
+    /* Retrieve the upper level specified event system, if any.
+     * Default to select() on OS X and poll() everywhere else because
+     * various parts of OMPI / ORTE use libevent with pty's.  pty's
+     * *only* work with select on OS X (tested on Tiger and Leopard);
+ * we *know* that both select and poll works with pty's everywhere
+     * else we care about (other mechansisms such as epoll *may* work
+     * with pty's -- we have not tested comprehensively with newer
+     * versions of Linux, etc.).  So the safe thing to do is:
+     *
+     * - On OS X, default to using "select" only
+     * - Everywhere else, default to using "poll" only (because poll
+     *   is more scalable than select)
+     *
+ * An upper layer may override this setting if it knows that pty's
+     * won't be used with libevent.  For example, we currently have
+ * ompi_mpi_init() set to use "all" (to include epoll and friends)
+     * so that the TCP BTL can be a bit more scalable -- because we
+     * *know* that MPI apps don't use pty's with libevent.
+     * Note that other tools explicitly *do* use pty's with libevent:
+     *
+     * - orted
+     * - orterun (probably only if it launches locally)
+     * - ...?
     */
    mca_base_param_reg_string_name("opal", "event_include",
- "Comma-delimited list of libevent subsystems to use (kqueue, devpoll, epoll, poll, select, and rtsig)", - false, false, "all", &event_module_include); + "Comma-delimited list of libevent subsystems to use (kqueue, devpoll, epoll, poll, select, and rtsig -- depending on your platform)",
+                                   false, false,
+#ifdef __APPLE__
+                                   "select",
+#else
+                                   "poll",
+#endif
+                                   &event_module_include);
    if (NULL == event_module_include) {
        /* Shouldn't happen, but... */
-        event_module_include = strdup("all");
+        event_module_include = strdup("select");
    }
opal_event_module_include = opal_argv_split(event_module_include,',');
    free(event_module_include);

Modified: trunk/orte/orted/orted_main.c
= = = = = = = = ======================================================================
--- trunk/orte/orted/orted_main.c       (original)
+++ trunk/orte/orted/orted_main.c 2008-03-25 13:18:17 EDT (Tue, 25 Mar 2008)
@@ -178,7 +178,6 @@
    char log_file[PATH_MAX];
    char *jobidstring;
    char *rml_uri;
-    char *tmp1, *tmp2;
    int i;
    opal_buffer_t *buffer;
    char hostname[100];
@@ -264,36 +263,6 @@
        if (1000 < i) i=0;
    }

-    /* _After_ opal_init_util() (and various other bookkeeping) but
-       _before_ orte_init(), we need to set an MCA param that tells
-       the orted not to use any other libevent mechanism except
-       "select" or "poll" (per potential pty issues with scalable
-       fd-monitoring mechanisms such as epoll() and friends -- these
-       issues *may* have been fixed in later OS releases and/or newer
-       versions of libevent, but we weren't willing to do all the
-       testing to figure it out.  So force the orted to use
- select()/poll() *only* -- there's so few fd's in the orted that
-       it really doesn't matter.
-
-       Note that pty's work fine with poll() on most systems, so we
-       prefer that (because it's more scalable than select()).
-       However, poll() does *not* work with ptys on OS X, so we use
-       select() there. */
-    mca_base_param_reg_string_name("opal", "event_include",
- "Internal orted MCA param: tell opal_init() to use a specific mechanism in libevent",
-                                   true, true,
-#ifdef __APPLE__
-                                   "select",
-#else
-                                   "poll",
-#endif
-                                   NULL);
- tmp1 = mca_base_param_environ_variable("opal", NULL, "event_include");
-    asprintf(&tmp2, "%s=select", tmp1);
-    putenv(tmp2);
-    free(tmp1);
-    free(tmp2);
-
    /* Okay, now on to serious business! */

    if (orted_globals.hnp) {
_______________________________________________
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn

Reply via email to