----- Forwarded message from "Ronald G. Minnich" <[email protected]> -----

From: "Ronald G. Minnich" <[email protected]>
Date: Thu, 3 Feb 2005 11:28:40 -0700 (MST)
To: Matt Leininger <[EMAIL PROTECTED]>
Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Subject: Re: [Clusters_sig] cluster projects list



On Thu, 3 Feb 2005, Matt Leininger wrote:

>    Here are a few open source project that Ron Minnich and I are working
> on.  Ron works at Los Alamos National Lab which has at least 3000-4000
> cluster nodes running BProc/Clustermatic.  I work at Sandia National
> Labs that also has about 3000-4000 Linux cluster nodes.  Our Linux
> cluster base at the national labs continues to grow at a rapid pace.

w.r.t to this note, I would like to add a few things. We have a 1700 node 
Opteron cluster, a 1024 node Xeon cluster, and lots of 256 and 128 node 
clusters running linuxbios and bproc. 

A big concern to us is performance at scale. The 1700 node cluster boots
in about 3 minutes, from power on to full usability. Starting an MPI job
across 1024 nodes, on bproc, takes about 3 seconds with a 16 MB image --
note this is a migration. A key part of bproc's performance at scale is 
the use of an asymmetric model -- there is one distinguished node, the 
master node, from which all the resources are visible. Slave nodes do not 
have the same visibility. 

32-node clusters tend to boot in 30 seconds or so. bproc is so fast that 
you can really barely tell there is any mpi startup cost at all. 

Migration has not proven useful for our needs. The clusters and the nodes 
don't really crash. This fact is leading me to thinking about how I might 
change bproc given that we have no real need for migration. 

I can tell people more if there is interest, but I do want to make sure 
that in whatever we do we don't sacrifice performance at scale. 

thanks

ron
kk

_______________________________________________
Clusters_sig mailing list
[EMAIL PROTECTED]
http://lists.osdl.org/mailman/listinfo/clusters_sig

----- End forwarded message -----
-- 
Eugen* Leitl <a href="http://leitl.org";>leitl</a>
______________________________________________________________
ICBM: 48.07078, 11.61144            http://www.leitl.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
http://moleculardevices.org         http://nanomachines.net

-------
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Attachment: pgpj3Ha3J2yQd.pgp
Description: PGP signature

Reply via email to