At 09:00 AM 8/20/2009, Doug Roberts wrote:
From: "Jack K. Horner" <[email protected]>
Precedence: list
MIME-Version: 1.0
To: [email protected]
References: <[email protected]>
In-Reply-To: <[email protected]>
Date: Wed, 19 Aug 2009 14:31:28 -0700
Reply-To: The Friday Morning Applied Complexity Coffee Group
<[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset="us-ascii"; format=flowed
Subject: Re: [FRIAM] Information request/Amazon EC2
Message: 3
At 09:00 AM 8/19/2009, Doug Roberts wrote:
From: Douglas Roberts <[email protected]>
Precedence: list
MIME-Version: 1.0
To: The Friday Morning Applied Complexity Coffee Group <[email protected]>
Date: Tue, 18 Aug 2009 10:38:23 -0600
Reply-To: The Friday Morning Applied Complexity Coffee Group
<[email protected]>
Message-ID: <[email protected]>
Content-Type: multipart/alternative; boundary=0016364ed7a89e0c4404716d2582
Subject: [FRIAM] Information request
Message: 1
Hi, all.
I am interested in learning what kind of experiences users of
Amazon's EC2 resources have had. What resources have you used;
what has been your experience with availability, ease of use, cost,
data transfer, privacy, etc.?
TIA,
--Doug
--
Doug Roberts
<mailto:[email protected]>[email protected]
<mailto:[email protected]>[email protected]
505-455-7333 - Office
505-670-8195 - Cell
_______________________________________________
Friam mailing list
[email protected]
http://redfish.com/mailman/listinfo/friam_redfish.com
Doug,
I don't have direct experience with EC2. However, I attended a
computational biology conference about two years ago in which Amazon
gave a talk on the system. Here's what I distilled:
1. If computation-to-communication ratio of your
application is >> 1 (e.g., the SETI power-spectrum analysis
problem), EC2's network performance is benign. If, in order to
realize a time-to-solution in your lifetime, your application
requires a computation/communication ratio approaching 1 (e.g., an
extreme-scale adaptive Eulerian mesh radiation-hydrodynamics code),
the EC2 network is your enemy.
2. For comparable problem setups, EC2 was less expensive
than buying time on IBM's pay-per-use Blue Gene system.
3. For comparable problem setups and theoretical peaks,
over the lifecycle the EC2 is less expensive per CPU-hour than a
cluster of PCs linked by fast Ethernet.
4. There was general agreement among the half-dozen or so
users of pay-per-use commercial clusters who were present at the
talk that EC2 gave the best bang for the buck.
Jack K. Horner
P. O. Box 266
Los Alamos, NM 87544-0266
Voice: 505-455-0381
Fax: 505-455-0382
email: [email protected]
From: Douglas Roberts <[email protected]>
Precedence: list
MIME-Version: 1.0
To: The Friday Morning Applied Complexity Coffee Group <[email protected]>
References: <[email protected]>
<[email protected]>
In-Reply-To: <[email protected]>
Date: Wed, 19 Aug 2009 14:42:41 -0600
Reply-To: The Friday Morning Applied Complexity Coffee Group
<[email protected]>
Message-ID: <[email protected]>
Content-Type: multipart/alternative; boundary=0003255548b2234457047184adb6
Subject: Re: [FRIAM] Information request/Amazon EC2
Message: 4
Thanks, Jack. I suspect that for distributed message passing ABM
simulations the Amazon EC is not a good solution.
--Doug
--
Doug Roberts
<mailto:[email protected]>[email protected]
<mailto:[email protected]>[email protected]
505-455-7333 - Office
505-670-8195 - Cell
Doug,
Whether a given parallel computing system performs well enough
running a message-passing-oriented Agent Based Modeling (ABM)
application depends on, among other things,
1. How the agents are distributed across the processing
elements (pes, nominally one microprocessor per pe) of the
system. Computational-mesh-oriented (CMO) applications that use
message-passing services are sufficiently analogous to
ABM-oriented applications that we can use mesh performance
data to help bound what ABM performance is likely to be,
given an allocation of agents per pe.
In particular, it is not uncommon for CMO
applications using ~50 state variable per cell to allocate
~100,000 cells per pe; state updates in such a system are
accomplished by message-passing (using OMP or MPI) among cells.
100,000 cells per pe is an empirically derived "rule of thumb",
but it is roughly invariant across modern production-class
compute nodes and a wide spectrum of mesh-oriented applications.
For optimal performance, the cells allocated to a pe should
be the set of cells that communicate most frequently with
each other. Sometimes a user can characterize that set
through a propagation-rate function defined in the
problem space (e.g., the speed of sound in a
medium, the speed at which a virus travels from one agent
to another, the speed of chemical reactions in a
biological network). Sometimes we don't know anything about
the communication/propagation dynamics, in which case
"reading" a pile of steaming chicken entrails predicts
performance about as well as anything else.
By analogy, if there were no more than ~50 state variables
per agent in an ABM application, an allocation of up to
100,000 tightly-communicating agents per pe would provide
usable performance on many production-class clusters today
(a cluster of PlayStations is an exception to
this rule of thumb, BTW).
Allocating one agent per pe would be a vast waste of
compute power for all except trivial problem setups.
All of the above is useful only if the user can control
the allocation of agents to pes. Most production-class
clusters, including the EC2, provide such controls.
Note that this problem has to be addressed by the
*user* in *any* cluster.
2. If the computation/communication ratio has to be near 1
to obtain tolerable time-to-solution, the
performance of the message-passing services matters
hugely. MPI and OMP have been optimized on only a few
commercially available systems. (A home-brew
multi-thousand-node Linux cluster, in contrast, is nowhere
near optimal in this sense. Optimizing the latter, as
a few incorrigibly optimistic souls have discovered,
amounts to redesigning much of Linux process-management.
If bleeding-edge performance matters, there is no free lunch.)
Jack
Jack K. Horner
P. O. Box 266
Los Alamos, NM 87544-0266
Voice: 505-455-0381
Fax: 505-455-0382
email: [email protected]
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org