On Saturday, 6 December 2014 at 09:24:57 UTC, John Colvin wrote:
Big simulations still benefit from dedicated clusters. Good performance often requires uniformly extremely low latencies between nodes, as well as the very fastest in distributed storage (read *and* write).
The question is not performance between nodes if you can partition the dataset (which I made a requirement), but how much you pay in total for getting the job done. So you can have inefficiency and still profit by renting CPU time because the total cost of ownership from having a local under-utilized server farm can be quite high.
But if the simulation requires a NUMA-like architecture… then you don't have a dataset that you can partition and solve in a map-reduce style.
P.S. GPUs are not a panacea for all hpc problems. For example, rdma is only a recent thing for GPUs across different nodes. In general there is a communication bandwidth and latency issue: the more power you pack in each compute unit (GPU or CPU or whatever), the more bandwidth you need connecting them.
HPC is a special case and different architectures will suit different problems, so you have to tailor the hardware architecture to the problems you want to solve, but then we are not talking $10.000 servers… If you need RDMA, then you are basically in NUMA land, which is not really suitable for a generic cloud solution in the first place?
