Re: [Beowulf] Station wagon full of tapes

Joe Landman Tue, 26 May 2009 09:25:17 -0700

Robert G. Brown wrote:

Sure, but why wouldn't it be cheaper for e.g. NSF or NIH to fund an
exact clone of the service Amazon plans to offer and provide it for free
to its supported research groups (or rather, do bookkeeping but it is
all internal bookkeeping, moving money from one pocket to another).

TANSTAAFL. Someone, somewhere has to pay. And to make this thinguseful you need a ton of bandwidth, and you need it cheap.


Bandwidth is not cheap.

Amazon has to make a profit.  Granting agencies don't have to pay the
profit that Amazon has to make.  Amazon has to take substantial risks to
make its profit.  Granting agencies have no risk.

No ... they have technological and organizational risks. Technological:will the thing work. Organizational: who is calling the shots, andhow many political battles must be won to get the solution done.

All of the things you assert for DNA sequencing are true for high energy
physics.  Enormous datasets, lots of computation.  HEP's INTERNATIONAL
solution is ATLAS, not Amazon.


Yup.

Supporting commercial access into such a DB a la >>google<< but for
genomic data, sure, but that's not really cluster computing, that's a
large shared DB.  I could see that as a spin off data service of Amazon
or Google or a new business altogether, but I'd view it as a niche and
not really HPC.

Well ... it is cluster computing, but not as most participants hereknow/understand it.

There is a huge amount of processing associated with this data. Its atminimum O(N) and often O(N x M) for some large M. This processing isinvariably integer based.

But the issue is that, for this research, the vast majority of end users(wet lab, etc) are IO and data motion bound. I see this getting worseover time, not better.

While there is a strong push to try to do these things at Amazon andother locations, the CBA for doing many jobs with ever increasing bolusof data does in fact favor the small local HPC system.

Grant funded research involving large scale shared data resources can
ALWAYS be done more cheaply than by buying the data services from
profit-making third parties unless there are nonlinear e.g. proprietary
IP barriers.  This is trebly true given that research facilities are

Not just less expensve, but more bandwidth. Campus bandwidths oftencompletely dwarf what you can get out of the campus, even if you can getInternet2 access. UMich has dedicated fibre pulls (at least they did in2002 when I worked with them) for gigabit between buildings.Expensive, but they needed it.

The names of the games are IO rates, and data motion rates between datasinks/sources. Call processing effectively infinitely fast for theseusers. Moving the data isn't.


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: land...@scalableinformatics.com
web  : http://www.scalableinformatics.com
       http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Station wagon full of tapes

Reply via email to