- the daemons are aware that the checkpointing is enabled. They can set the environment variable which will force the opal_event_include to be set to select.
- as the environment variables have a higher priority over the configuration file, this will work on most cases (except when the user add the mca parameter by hand).
- in the checkpoint/restart code, we can add a test that check the value of opal_event_include, print a message if the value is not select, and disable the checkpoint/restart functionality.
george. On Mar 18, 2008, at 4:59 PM, Jeff Squyres wrote:
George added an MCA parameter for it (opal_event_include is a string that can be set to "select" or "poll"), but it has to be set before opal_init(). Josh: could you try running with the MCA parameter opal_event_include set to "select"? This would confirm Brian's hypothesis... Given that opal_init() is the first thing that happens in ompi_mpi_init(), this may not be enough -- you could *detect* that we can't do BLCR, but this mechanism doesn't allow libmpi to set something saying "reset libevent to be able to only use select()." George -- is that hard to add? I would imagine that it could be kinda difficult to reset libevent after there are already users of it, fd's and other events that may have been added, etc...? On Mar 18, 2008, at 4:29 PM, Brian W. Barrett wrote:Jeff / George - Did you add a way to specify which event modules are used? Because epollpushs the socket list into the kernel, I can see how it would screw upBLCR. I bet everything would work if we forced the use of poll / select. Brian On Tue, 18 Mar 2008, Jeff Squyres wrote:Crud, ok. Keep us posted. On Mar 18, 2008, at 4:16 PM, Josh Hursey wrote:I'm testing with checkpoint/restart and the new libevent seems to bemessing up the checkpoints generated by BLCR. I'll be taking a look at it over the next couple of days, but just thought I'd let people know. Unfortunately I don't have any more details at the moment. -- Josh On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote:WHAT: Bring new version of libevent to the trunk. WHY: Newer version, slightly better performance (lower overheads / lighter weight), properly integrate the use of epoll and other scalable fd monitoring mechanisms. WHERE: 98% of the changes are in opal/event; there's a few changes to configury and one change to the orted. TIMEOUT: COB, Friday, 21 March 2008 DESCRIPTION:George/UTK has done the bulk of the work to integrate a new versionof libevent on the following tmp branch: https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge ** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS BRANCH! ** Cisco ran MTT on this branch on Friday and everything checked out (i.e., no more failures than on the trunk). We just made a few more minor changes today and I'm running MTT again now, but I'm notexpecting any new failures (MTT will take several hours). We wouldlike to bring the new libevent in over this upcoming weekend, but would very much appreciate if others could test on their platforms (Cisco tests mainly 64 bit RHEL4U4). This new libevent *should* be afairly side-effect free change, but it is possible that since we'renow using epoll and other scalable fd monitoring tools, we'll run into some unanticipated issues on some platforms. Here's a consolidated diff if you want to see the changes: https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public% 2Flibevent-merge&old=17846&new_path=trunk&new=17842 Thanks. -- Jeff Squyres Cisco Systems _______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel-- Jeff Squyres Cisco Systems _______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
smime.p7s
Description: S/MIME cryptographic signature