On 02/02/2015 08:38 AM, Michael Di Domenico wrote:
Glenn's article is good and hits on many topics correctly (of which
i've seen, having sat on the vendor side of NSF proposals in a former
life). However I'm a little concerned by what i perceive of his
attitude towards stripping funding from centers that don't have the
technical prowess to run an HPC resources.
NSF's goal is to further science. stripping funding, i don't believe
is the correct solution. if a center isn't keeping up or doesn't have
the skills from the start, there should be a mentor put in place from
one of the other bigger centers. stripping funding is only going to
shrink the pool of knowledge to a few key installations around the US,
which probably isn't the best way to spread knowledge. but i do
concur there is a point where the NSF would probably/already has
spread itself too thin
seems to me NFS needs to get back into building the HPC community of
PEOPLE rather then building hero machines at six or seven
installations across the us.
I interpreted it differently. I think he was saying that the NSF funding
for HPC should be concentrated in fewer sites, similar to what the DOE
has done with their leadership computing facilities (LCFs): Argonne
Leadership Computing Facility (ALCF) and Oak Ridge Leadership Computing
Facility (ORLCF). By concentrating their resources in fewer locations,
they can take advantage of economies of scale:
1. Pay for two large data centers instead of 5 or 10
2. Higher a somewhat larger, but much more talented staff whose talents
can be spread out over several clusters and storage systems rather than
many smaller support staffs with (most likely) less capabilities for
each site.
And on, and on.
By committing heavily to less sites, it's easier for the NSF to focus on
providing a stable financial footing, than having to constantly spread
the money around many different sites like they're broadcasting seeding
a lawn.
TL;DR: Put all your eggs into 2-3 baskets, and keep a really good eye on
those baskets.
Regarding your comment about 'hero' systems: I read a paper a couple of
years ago that the large majority of computational scientists don't need
these massive exascale systems - most only need a 'department'-sized
cluster with ~1024 cores. I believe SDSC did their own study with XSEDE
data and came to the same conclusion (Glenn actually told me this. I'm
not sure if this is published anywhere).
This reminds me of 'The Long Tail'
(http://en.wikipedia.org/wiki/Long_tail): The hero systems cater to the
small percentage of extremely talented computational scientists at the
top of their fields, and the long tail, which is your 'average'
computational science PI or grad student at universities around the
world, still has to rely on an antiquated small departmental cluster.
because the NSF focuses on the hero users to the detriment of the long
tail, which actually represents the bulk of their funded scientists.
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf