----- Forwarded message from "Ronald G. Minnich" <[email protected]> -----
From: "Ronald G. Minnich" <[email protected]> Date: Thu, 3 Feb 2005 11:28:40 -0700 (MST) To: Matt Leininger <[EMAIL PROTECTED]> Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Subject: Re: [Clusters_sig] cluster projects list On Thu, 3 Feb 2005, Matt Leininger wrote: > Here are a few open source project that Ron Minnich and I are working > on. Ron works at Los Alamos National Lab which has at least 3000-4000 > cluster nodes running BProc/Clustermatic. I work at Sandia National > Labs that also has about 3000-4000 Linux cluster nodes. Our Linux > cluster base at the national labs continues to grow at a rapid pace. w.r.t to this note, I would like to add a few things. We have a 1700 node Opteron cluster, a 1024 node Xeon cluster, and lots of 256 and 128 node clusters running linuxbios and bproc. A big concern to us is performance at scale. The 1700 node cluster boots in about 3 minutes, from power on to full usability. Starting an MPI job across 1024 nodes, on bproc, takes about 3 seconds with a 16 MB image -- note this is a migration. A key part of bproc's performance at scale is the use of an asymmetric model -- there is one distinguished node, the master node, from which all the resources are visible. Slave nodes do not have the same visibility. 32-node clusters tend to boot in 30 seconds or so. bproc is so fast that you can really barely tell there is any mpi startup cost at all. Migration has not proven useful for our needs. The clusters and the nodes don't really crash. This fact is leading me to thinking about how I might change bproc given that we have no real need for migration. I can tell people more if there is interest, but I do want to make sure that in whatever we do we don't sacrifice performance at scale. thanks ron kk _______________________________________________ Clusters_sig mailing list [EMAIL PROTECTED] http://lists.osdl.org/mailman/listinfo/clusters_sig ----- End forwarded message ----- -- Eugen* Leitl <a href="http://leitl.org">leitl</a> ______________________________________________________________ ICBM: 48.07078, 11.61144 http://www.leitl.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE http://moleculardevices.org http://nanomachines.net ------- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
pgpj3Ha3J2yQd.pgp
Description: PGP signature
