It has been on my to-do list for a while to start a FAQ listing of the
various resilience/FT related activities in and around Open MPI. This would
provide a starting location for users and new developers could go to for an
overview of each of the features, and how to activate/use the feature.

I'll try to bump that up the priority list and post a message once it is
ready. Probably a month or so off since I need to collect some information
from various developers.

-- Josh

On Sun, Jun 26, 2011 at 6:01 PM, Ralph Castain <r...@open-mpi.org> wrote:

> I think we're some ways away from declaring a "resilient ORTE". Josh and I
> have been committing pieces of it over the last two years, and Wes just
> committed another piece the other day that might have been titled "fault
> tolerant OOB" as it primarily addressed maintaining comm routing during node
> failures.
>
> Setting aside the obvious MPI issues, there are several
> branches/organizations working different aspects of the ORTE problem,
> including:
>
> * fault prediction and proactive migration
>
> * mapping algorithms to minimize failure cascades
>
> * simultaneous failure handling
>
> * alternative wiring methods that eliminate the OOB routing issues
>
> etc. We expect most of those developments to arrive over the next 6-12
> months. Once that has occurred, we'll probably be close to what we would
> call a "resilient" system.
>
> Until then, we are improving, but still far from "resilient".
>
>
> On Jun 24, 2011, at 10:24 AM, Ken Lloyd wrote:
>
>  Josh and Wesley,
>
> Will you be presenting Resilient ORTE at Resilience 2011 in Bordeaux?
>
> http://xcr.cenit.latech.edu/resilience2011/
>
>   =====================
> *Kenneth A. Lloyd*
> CEO - Director of Systems Science
> Watt Systems Technologies Inc.
> www.wattsys.com
> kenneth.ll...@wattsys.com
>
> This e-mail is covered by the Electronic Communications Privacy Act, 18
> U.S.C. 2510-2521 and is intended only for the addressee named above. It may
> contain privileged or confidential information. If you are not the addressee
> you must not copy, distribute, disclose or use any of the information in it.
> If you have received it in error please delete it and immediately notify the
> sender.
>
>
>   _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey

Reply via email to