Robert G. Brown wrote:

Sure, but why wouldn't it be cheaper for e.g. NSF or NIH to fund an
exact clone of the service Amazon plans to offer and provide it for free
to its supported research groups (or rather, do bookkeeping but it is
all internal bookkeeping, moving money from one pocket to another).

TANSTAAFL. Someone, somewhere has to pay. And to make this thing useful you need a ton of bandwidth, and you need it cheap.

Bandwidth is not cheap.

Amazon has to make a profit.  Granting agencies don't have to pay the
profit that Amazon has to make.  Amazon has to take substantial risks to
make its profit.  Granting agencies have no risk.

No ... they have technological and organizational risks. Technological: will the thing work. Organizational: who is calling the shots, and how many political battles must be won to get the solution done.

All of the things you assert for DNA sequencing are true for high energy
physics.  Enormous datasets, lots of computation.  HEP's INTERNATIONAL
solution is ATLAS, not Amazon.

Yup.

Supporting commercial access into such a DB a la >>google<< but for
genomic data, sure, but that's not really cluster computing, that's a
large shared DB.  I could see that as a spin off data service of Amazon
or Google or a new business altogether, but I'd view it as a niche and
not really HPC.

Well ... it is cluster computing, but not as most participants here know/understand it.

There is a huge amount of processing associated with this data. Its at minimum O(N) and often O(N x M) for some large M. This processing is invariably integer based.

But the issue is that, for this research, the vast majority of end users (wet lab, etc) are IO and data motion bound. I see this getting worse over time, not better.

While there is a strong push to try to do these things at Amazon and other locations, the CBA for doing many jobs with ever increasing bolus of data does in fact favor the small local HPC system.

Grant funded research involving large scale shared data resources can
ALWAYS be done more cheaply than by buying the data services from
profit-making third parties unless there are nonlinear e.g. proprietary
IP barriers.  This is trebly true given that research facilities are

Not just less expensve, but more bandwidth. Campus bandwidths often completely dwarf what you can get out of the campus, even if you can get Internet2 access. UMich has dedicated fibre pulls (at least they did in 2002 when I worked with them) for gigabit between buildings. Expensive, but they needed it.

The names of the games are IO rates, and data motion rates between data sinks/sources. Call processing effectively infinitely fast for these users. Moving the data isn't.

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: land...@scalableinformatics.com
web  : http://www.scalableinformatics.com
       http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to