I have some more data from the field.

Leaving "opal_event_include" unset (Default) BLCR would give me the following error when trying to restart a 2 process 'noop' MPI application:
----------------------------
shell$ ompi-restart ompi_global_snapshot_8587.ckpt
Restart failed: Bad file descriptor
Restart failed: Bad file descriptor
shell$
----------------------------

If I set "opal_event_include" to "select" then I get a different message, this one from Open MPI:
----------------------------
shell$  ompi-restart ompi_global_snapshot_8543.ckpt
[warn] select: Bad file descriptor
[odin001.cs.indiana.edu:18027] opal_event_base_loop: ompi_evesel- >dispatch() failed.
[warn] select: Bad file descriptor
[odin001.cs.indiana.edu:18027] opal_event_base_loop: ompi_evesel- >dispatch() failed.
[warn] select: Bad file descriptor
...
----------------------------
This repeats until I kill the restarted job. I've figured out what is outputing the error message, but I can't say exactly why at the moment. Still digging.

If I set "opal_event_include" to "poll" then everything is fine. The restart works as expected in all scenarios. :)

I'm currently using BLCR 0.6.0 Beta 6 on this machine. I've requested that BLCR be upgraded on this machine so I can test the latest version to see if the poll/epoll problem persists. I'll work with Paul if this turns up anything.

As far as what Open MPI needs to do, I don't think we need to do anything at the moment. I can add the MCA parameter to the 'ft-enable- cr' AMCA file which will work as a temporary fix.

Thanks for all your help in tracking this problem.

Cheers,
Josh

On Mar 18, 2008, at 5:19 PM, George Bosilca wrote:

Its like rewriting libevent from scratch. I guess it can be done, but it will be a long and painful process. How about the following solution:

- the daemons are aware that the checkpointing is enabled. They can set the environment variable which will force the opal_event_include to be set to select.

- as the environment variables have a higher priority over the configuration file, this will work on most cases (except when the user add the mca parameter by hand).

- in the checkpoint/restart code, we can add a test that check the value of opal_event_include, print a message if the value is not select, and disable the checkpoint/restart functionality.

  george.

On Mar 18, 2008, at 4:59 PM, Jeff Squyres wrote:

George added an MCA parameter for it (opal_event_include is a string
that can be set to "select" or "poll"), but it has to be set before
opal_init().

Josh: could you try running with the MCA parameter opal_event_include
set to "select"?  This would confirm Brian's hypothesis...

Given that opal_init() is the first thing that happens in
ompi_mpi_init(), this may not be enough -- you could *detect* that we
can't do BLCR, but this mechanism doesn't allow libmpi to set
something saying "reset libevent to be able to only use select()."

George -- is that hard to add? I would imagine that it could be kinda
difficult to reset libevent after there are already users of it, fd's
and other events that may have been added, etc...?


On Mar 18, 2008, at 4:29 PM, Brian W. Barrett wrote:

Jeff / George -

Did you add a way to specify which event modules are used?  Because
epoll
pushs the socket list into the kernel, I can see how it would screw up
BLCR.  I bet everything would work if we forced the use of poll /
select.

Brian

On Tue, 18 Mar 2008, Jeff Squyres wrote:

Crud, ok.  Keep us posted.

On Mar 18, 2008, at 4:16 PM, Josh Hursey wrote:

I'm testing with checkpoint/restart and the new libevent seems to be messing up the checkpoints generated by BLCR. I'll be taking a look at it over the next couple of days, but just thought I'd let people
know. Unfortunately I don't have any more details at the moment.

-- Josh

On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote:

WHAT: Bring new version of libevent to the trunk.

WHY: Newer version, slightly better performance (lower overheads /
lighter weight), properly integrate the use of epoll and other
scalable fd monitoring mechanisms.

WHERE: 98% of the changes are in opal/event; there's a few
changes to
configury and one change to the orted.

TIMEOUT: COB, Friday, 21 March 2008

DESCRIPTION:

George/UTK has done the bulk of the work to integrate a new version
of
libevent on the following tmp branch:

  https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge

** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS
BRANCH!
**

Cisco ran MTT on this branch on Friday and everything checked out
(i.e., no more failures than on the trunk).  We just made a few
more
minor changes today and I'm running MTT again now, but I'm not
expecting any new failures (MTT will take several hours). We would
like to bring the new libevent in over this upcoming weekend, but
would very much appreciate if others could test on their platforms
(Cisco tests mainly 64 bit RHEL4U4).  This new libevent *should*
be a
fairly side-effect free change, but it is possible that since we're
now using epoll and other scalable fd monitoring tools, we'll run
into
some unanticipated issues on some platforms.

Here's a consolidated diff if you want to see the changes:

https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public%
2Flibevent-merge&old=17846&new_path=trunk&new=17842

Thanks.

--
Jeff Squyres
Cisco Systems

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to