After taking a look at how epoll is implemented in the Linyux kernel, I
can say with 100% certainty that BLCR will not restore the epoll fd
correctly.  I hope to fix that eventually, but have too many other
things on my plate to address is now.

Since I cannot promise how soon BLCR may be able to resolve this
problem, I suggest that Josh continue exploring the alternatives.  At
least "opal_event_include" set to "poll" appears to work.  It is not
clear to me if the "select" problem is related to BLCR or not.

I am guessing that I don't get a say as to weather the BLCR/epoll
problems should delay the libevent merge, but I trust the rest of you to
determine what is in the best interest of OMPI.

-Paul

Josh Hursey wrote:
> I have some more data from the field.
> 
> Leaving "opal_event_include" unset (Default) BLCR would give me the  
> following error when trying to restart a 2 process 'noop' MPI  
> application:
> ----------------------------
> shell$ ompi-restart ompi_global_snapshot_8587.ckpt
> Restart failed: Bad file descriptor
> Restart failed: Bad file descriptor
> shell$
> ----------------------------
[snip]

-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Future Technologies Group
HPC Research Department                   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to