George added an MCA parameter for it (opal_event_include is a string that can be set to "select" or "poll"), but it has to be set before opal_init().

Josh: could you try running with the MCA parameter opal_event_include set to "select"? This would confirm Brian's hypothesis...

Given that opal_init() is the first thing that happens in ompi_mpi_init(), this may not be enough -- you could *detect* that we can't do BLCR, but this mechanism doesn't allow libmpi to set something saying "reset libevent to be able to only use select()."

George -- is that hard to add? I would imagine that it could be kinda difficult to reset libevent after there are already users of it, fd's and other events that may have been added, etc...?


On Mar 18, 2008, at 4:29 PM, Brian W. Barrett wrote:

Jeff / George -

Did you add a way to specify which event modules are used? Because epoll
pushs the socket list into the kernel, I can see how it would screw up
BLCR. I bet everything would work if we forced the use of poll / select.

Brian

On Tue, 18 Mar 2008, Jeff Squyres wrote:

Crud, ok.  Keep us posted.

On Mar 18, 2008, at 4:16 PM, Josh Hursey wrote:

I'm testing with checkpoint/restart and the new libevent seems to be
messing up the checkpoints generated by BLCR. I'll be taking a look
at it over the next couple of days, but just thought I'd let people
know. Unfortunately I don't have any more details at the moment.

-- Josh

On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote:

WHAT: Bring new version of libevent to the trunk.

WHY: Newer version, slightly better performance (lower overheads /
lighter weight), properly integrate the use of epoll and other
scalable fd monitoring mechanisms.

WHERE: 98% of the changes are in opal/event; there's a few changes to
configury and one change to the orted.

TIMEOUT: COB, Friday, 21 March 2008

DESCRIPTION:

George/UTK has done the bulk of the work to integrate a new version
of
libevent on the following tmp branch:

   https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge

** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS
BRANCH!
**

Cisco ran MTT on this branch on Friday and everything checked out
(i.e., no more failures than on the trunk). We just made a few more
minor changes today and I'm running MTT again now, but I'm not
expecting any new failures (MTT will take several hours).  We would
like to bring the new libevent in over this upcoming weekend, but
would very much appreciate if others could test on their platforms
(Cisco tests mainly 64 bit RHEL4U4). This new libevent *should* be a
fairly side-effect free change, but it is possible that since we're
now using epoll and other scalable fd monitoring tools, we'll run
into
some unanticipated issues on some platforms.

Here's a consolidated diff if you want to see the changes:

https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public%
2Flibevent-merge&old=17846&new_path=trunk&new=17842

Thanks.

--
Jeff Squyres
Cisco Systems

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

Reply via email to