Jim Lux wrote:
At 12:04 AM 3/16/2006, Daniel Pfenniger wrote:
The shipment of this accelerator card has been delayed many times.
Last time
I asked was October 2005. Apparently the first shipment has been
made this
month for a Japanese supercomputer with 10^4 Opterons. The cost is not
indicated, but something like above $8000.- per card would put it outside
commodity hardware. I wouldn't be astonished that more performance can
be obtained in most applications with commodity clustering.
I think under 10k$ keeps it commodity (read as what most managers could
likely sign for themselves without needing to walk the approval ladder).
There are probably applications where a dedicated card can blow the
doors off a collection of PCs. At some point, the interprocessor
communication latency inherent in any sort of cabling between processors
would start to dominate.
There are numerous such examples in life sciences, in chemistry, and
other areas. Such cards are not universal, they cannot be viewed as
general purpose processors. You have to view them as dedicated attached
processors.
The Clearspeed cards have 2 of their co-processors. Each has 96 FP
units. I believe the architecture is a systolic array. To program them
at a high level, you have a C variant that you can use, or you can hand
code assembly. The latter is hard.
The issue for these cards are the memory bandwidth in and out of the
PCI-x based interface. There are tricks you can play for a well
designed system, but you cannot escape the bandwidth ceiling of PCI-x.
For many algorithms of potential interest to this list, memory bandwidth
is as important as FP performance. Having effectively 100 processors on
the far side of a narrow pipe means you have to design algorithms with
that pipe width in mind.
If Clearspeed would consider mass production with a cost like
$100.-$500.-
per card the market would be huge, because the card would be competing
with
multi-core processors like the IBM-Sony Cell.
Kahan had some interesting things to say about the Cell. Summarized
like this. You get to choose one with Cell: Fast or Accurate. He was
making this point in general but pointed out some issues. This is from
a talk on his web site. Caveat: I don't have a cell to play with (yes
Santa, I would like 1 or 2 hundred), so I can't run paranoia or other
fun tests.
You need "really big" volumes to get there. Retail pricing of $200
implies a bill of materials cost down in the sub $20 range.
Yup. Volume drives lower pricing. Economies of scale matter. This is
why FPGAs are where they are price wise. They don't have large volumes.
If they did, pricing should be better.
Considering
that a run of the mill ASIC spin costs >$1M (for a small number of parts
produced), your volume has to be several hundred thousand (or a million)
before you even cover the cost of your development.
The video card folks can do this because
a) each successive generation of cards is derived from the past, so the
NRE is lower.. most of the card (and IC) is the same
I believe they are in incremental improvement mode. This keeps redesign
costs way down.
b) they have truly gargantuan volumes
This is the critical thing. Remember, these are highly pipelined
graphical supercomputers. The ClawHMMer project ran a hardware
accelerated HMMer on an nVidia GT6800 5x faster than the P4 hosting the
card.
c) they have sales from existing products to provide cash to support the
development of version N+1.
Cash is king.
{I leave aside the possibility of magic elves, although with some
consumer products, I have no idea how they can design, produce, and sell
it at the price they do. Making use of relative currency values can
also help, but that's in the non-technological magic elf category, as
far as I'm concerned.}
Actually lots of stuff is done outside the US these days. Not magic
elves per se, but Indian and Chinese engineers and scientists who are
extremely good at what they do. This starts getting into a cost and
productivity discussion rather rapidly.
The possibly most interesting niche for the Clearspeed cards appears
to me
accelerating proprietary applications like Matlab, Mathematica and
particularly
Excel that run on a single PC and that can hardly be reprogrammed by
their
users to run on a distributed cluster.
I would say that there is more potential for a clever soul to reprogram
the guts of Matlab, etc., to transparently share the work across
multiple machines. I think that's in the back of the mind of MS, as
they move toward a services environment and .NET
:)
So imagine if you will an LD_PRELOAD environment variable which
points a users code over to the relevant libraries which work their
magic behind the scenes. I would be hard pressed to imagine using this
for Excel, but could see it for Matlab. Programming at high levels with
high performance. Of course Kahan also rips into them over accuracy ...
Jim
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf