On Nov 13, 2007 12:41 PM, Brad Penoff <pen...@cs.ubc.ca> wrote: > On Nov 12, 2007 3:26 AM, Jeff Squyres <jsquy...@cisco.com> wrote: > > I have no objections to bringing this into the trunk, but I agree that > > an .ompi_ignore is probably a good idea at first. > > I'll try to cook up a commit soon then!
It's in there now! https://svn.open-mpi.org/trac/ompi/changeset/16723 A quick sanity test shows that things are operational. For others to use it, they'll have to of course adjust ompi_ignore (or .ompi_unignore). We're playing with MTT now so I'd expect we'll have some questions on that front in the near future. Where is the best place to put BTL-specific documentation (for example, some setup tips and weblinks)? brad > > > One question that I'd like to have answered is how OMPI decides > > whether to use the SCTP BTL or not. If there are SCTP stacks > > available by default in Linux and OS X -- but their performance may be > > sub-optimal and/or buggy, we may want to have the SCTP BTL only > > activated if the user explicitly asks for it. Open MPI is very > > concerned with "out of the box" behavior -- we need to ensure that > > "mpirun a.out" will "just work" on all of our supported platforms. > > Just to make a few things explicit... > > Things would only work out of the box on FreeBSD, and there the stack > is very good. > > We have less experience with the Linux stack but hope the availability > of and SCTP BTL will help encourage its use by us and others. Now it > is a module by default (loaded with "modprobe sctp") but the actual > SCTP sockets extension API needs to be downloaded and installed > separately. The so-called lksctp-tools can be obtained here: > http://sourceforge.net/project/showfiles.php?group_id=26529 > > The OS X stack does not come by default but instead is a kernel extension: > http://sctp.fh-muenster.de/sctp-nke.html > I haven't yet started this testing but intend to soon. As of now > though, the supplied configure.m4 does not try to even build the > component on Mac OS X. > > So in my opinion, things in the configure scripts should be fine the > way the are since only FreeBSD stack (which we have confidence in) > will try to work out of the box; the others require the user to > install things. > > > A question I had was with respect to what to set for the default value > of btl_sctp_exclusivity... I had wanted the exclusivity to be > "slightly less than TCP" so it was available but not the default. In > the code I set btl_sctp_exclusivity to this: > MCA_BTL_EXCLUSIVITY_LOW - 1 > ...however MCA_BTL_EXCLUSIVITY_LOW is defined as 0 and ompi_info says > that exclusivity must be >= 0... a -1 exclusivity doesn't seem to > break anything though... If two BTLs have the same exclusivity, what > is the tie-break? Alphabetic order? > > > > > Will UBC setup regular MTT runs to test the SCTP stuff? :-) > > > > We've only started playing with MTT so I'm sure we'll have plenty of > questions as we begin this process! > > > > More below. > > > > > > On Nov 10, 2007, at 9:25 PM, Brad Penoff wrote: > > > > >>> Currently, both the one-to-one and the one-to-many make use of the > > >>> event library offered by Open MPI. The callback functions for the > > >>> one-to-many style however are quite unique as multiple endpoints may > > >>> be interested in the events that poll returns. Currently we use > > >>> these > > >>> unique callback functions, but in the future the hope is to play > > >>> with > > >>> the potential benefits of a btl_progress function, particularly for > > >>> the one-to-many style. > > >> > > >> In my experience the event callbacks have a high overhead compared > > >> to a > > >> progress function, so I'd say thats definitely worth checking out. > > > > > > We noticed that poll is only called after a timer goes off while > > > btl_progress would be called with each iteration of opal_progress, so > > > noticing that along with you encouragement makes us want to check it > > > out even more. > > > > > > Be aware that based on discussions from the Paris meeting, some > > changes to libevent are coming (I really need to get this on a wiki > > page or something). Here's a quick summary: > > > > - We're waiting for a new release of libevent (or libev -- we'll see > > how it shakes out) that has lots of bug fixes and performance > > improvements as compared to the version we currently have in the OMPI > > tree. Based on some libevent mailing list traffic, this release may > > be in Dec 2007. We'll see what happens. > > > > - After we update libevent, we'll be making a policy change w.r.t. > > OMPI progress functions and timer callbacks: only software layers with > > actual devices will be allowed to register progress functions (in > > particular, the io and osd framework progress functions will be > > eliminated; see below). All other progress-requiring functions will > > have to use timers. This means that every time we call progress, we > > *only* call the stuff that needs to be polled as frequently as > > possible. We'll call the less-important progress stuff less > > frequently (e.g., ORTE OOB/RML). > > > > - We'll be changing our use of libevent to utilize the more scalable > > polling capabilities (such as epoll and friends). We don't use them > > right now because on all OS's that we currently care about (Linux, OS > > X, Solaris), mixing the scalable fd polling mechanism with pty's > > results in Very Very Bad Things. We'll special case where pty's are > > used and only use select/poll there, and then use epoll (etc.) > > elsewhere. > > > > - We'll also be changing our use of libevent to utilized timers > > properly. > > > > - ompi_request_t will be augmented to have a callback that, if non- > > NULL, will be invoked when the request is completed. This will allow > > removing the io and osd framework progress functions. > > > > - We may also add a high-performance clock framework in Open MPI -- a > > way of accessing high-resolution timers and clocks on the host (e.g., > > on Intel chips, additional algorithms are necessary to normalize the > > per-chip clocks between sockets, especially if a process bounces > > between sockets -- unnecessary on AMD, PPC, and SPARC platforms). > > This could improve performance and precision of the libevent timers. > > > > - Finally, registering progress functions will take a new parameter: a > > file descriptor. If a file descriptor is provided and opal_progress() > > decides that it wants to block (specific mechanism TBD, but probably > > something similar to what other hybrid polling/blocking systems do: > > poll for a while, and if nothing "interesting" happens, block) *and* > > if all registered progress functions have valid fd's, then we'll block > > until either a timer expires or something "interesting" happens. > > > > Thanks for the update on the things to come! We'll definitely keep an > eye on things as they arrive. > > brad > > > -- > > > Jeff Squyres > > Cisco Systems > > > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > >