After taking a look at how epoll is implemented in the Linyux kernel, I can say with 100% certainty that BLCR will not restore the epoll fd correctly. I hope to fix that eventually, but have too many other things on my plate to address is now.
Since I cannot promise how soon BLCR may be able to resolve this problem, I suggest that Josh continue exploring the alternatives. At least "opal_event_include" set to "poll" appears to work. It is not clear to me if the "select" problem is related to BLCR or not. I am guessing that I don't get a say as to weather the BLCR/epoll problems should delay the libevent merge, but I trust the rest of you to determine what is in the best interest of OMPI. -Paul Josh Hursey wrote: > I have some more data from the field. > > Leaving "opal_event_include" unset (Default) BLCR would give me the > following error when trying to restart a 2 process 'noop' MPI > application: > ---------------------------- > shell$ ompi-restart ompi_global_snapshot_8587.ckpt > Restart failed: Bad file descriptor > Restart failed: Bad file descriptor > shell$ > ---------------------------- [snip] -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900