Robert G. Brown wrote:
Sure, but why wouldn't it be cheaper for e.g. NSF or NIH to fund an exact clone of the service Amazon plans to offer and provide it for free to its supported research groups (or rather, do bookkeeping but it is all internal bookkeeping, moving money from one pocket to another).
TANSTAAFL. Someone, somewhere has to pay. And to make this thing useful you need a ton of bandwidth, and you need it cheap.
Bandwidth is not cheap.
Amazon has to make a profit. Granting agencies don't have to pay the profit that Amazon has to make. Amazon has to take substantial risks to make its profit. Granting agencies have no risk.
No ... they have technological and organizational risks. Technological: will the thing work. Organizational: who is calling the shots, and how many political battles must be won to get the solution done.
All of the things you assert for DNA sequencing are true for high energy physics. Enormous datasets, lots of computation. HEP's INTERNATIONAL solution is ATLAS, not Amazon.
Yup.
Supporting commercial access into such a DB a la >>google<< but for genomic data, sure, but that's not really cluster computing, that's a large shared DB. I could see that as a spin off data service of Amazon or Google or a new business altogether, but I'd view it as a niche and not really HPC.
Well ... it is cluster computing, but not as most participants here know/understand it.
There is a huge amount of processing associated with this data. Its at minimum O(N) and often O(N x M) for some large M. This processing is invariably integer based.
But the issue is that, for this research, the vast majority of end users (wet lab, etc) are IO and data motion bound. I see this getting worse over time, not better.
While there is a strong push to try to do these things at Amazon and other locations, the CBA for doing many jobs with ever increasing bolus of data does in fact favor the small local HPC system.
Grant funded research involving large scale shared data resources can ALWAYS be done more cheaply than by buying the data services from profit-making third parties unless there are nonlinear e.g. proprietary IP barriers. This is trebly true given that research facilities are
Not just less expensve, but more bandwidth. Campus bandwidths often completely dwarf what you can get out of the campus, even if you can get Internet2 access. UMich has dedicated fibre pulls (at least they did in 2002 when I worked with them) for gigabit between buildings. Expensive, but they needed it.
The names of the games are IO rates, and data motion rates between data sinks/sources. Call processing effectively infinitely fast for these users. Moving the data isn't.
-- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: land...@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf