Indeed.  Some interesting news here:

http://www.enterprisetech.com/2016/03/04/docker-acquires-apache-aurora-founders/

Us old style guys are going to have our lunch money stolen by young
upstarts. Or is that startups?
Seriously - these guys know how to keep things running at scale and how to
tolerate failures.





On 3 March 2016 at 23:30, Christopher Samuel <[email protected]> wrote:

> On 04/03/16 06:40, Douglas Eadline wrote:
>
> > Yes, failure needs to be option.
>
> The Slurm folks have been working on failure management support for a
> little while, the idea being you can have a pool of spare nodes to pick
> from (or alternatively bargain with a scheduler for a node that's
> currently busy to come free later on and then add it to the job,
> potentially extending the walltime to make up for the shortfall).
>
> A better description from someone with higher caffeination is here:
>
> http://slurm.schedmd.com/nonstop.html
>
> All the best,
> Chris
> --
>  Christopher Samuel        Senior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: [email protected] Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/      http://twitter.com/vlsci
>
> _______________________________________________
> Beowulf mailing list, [email protected] sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to