Jim Lux wrote:
At 12:04 AM 3/16/2006, Daniel Pfenniger wrote:

The shipment of this accelerator card has been delayed many times. Last time I asked was October 2005. Apparently the first shipment has been made this
month for a Japanese supercomputer with 10^4 Opterons.   The cost is not
indicated, but something like above $8000.- per card would put it outside
commodity hardware.  I wouldn't be astonished that more performance can
be obtained in most applications with commodity clustering.

I think under 10k$ keeps it commodity (read as what most managers could likely sign for themselves without needing to walk the approval ladder).

There are probably applications where a dedicated card can blow the doors off a collection of PCs. At some point, the interprocessor communication latency inherent in any sort of cabling between processors would start to dominate.

There are numerous such examples in life sciences, in chemistry, and other areas. Such cards are not universal, they cannot be viewed as general purpose processors. You have to view them as dedicated attached processors.

The Clearspeed cards have 2 of their co-processors. Each has 96 FP units. I believe the architecture is a systolic array. To program them at a high level, you have a C variant that you can use, or you can hand code assembly. The latter is hard.

The issue for these cards are the memory bandwidth in and out of the PCI-x based interface. There are tricks you can play for a well designed system, but you cannot escape the bandwidth ceiling of PCI-x. For many algorithms of potential interest to this list, memory bandwidth is as important as FP performance. Having effectively 100 processors on the far side of a narrow pipe means you have to design algorithms with that pipe width in mind.

If Clearspeed would consider mass production with a cost like $100.-$500.- per card the market would be huge, because the card would be competing with
multi-core processors like the IBM-Sony Cell.

Kahan had some interesting things to say about the Cell. Summarized like this. You get to choose one with Cell: Fast or Accurate. He was making this point in general but pointed out some issues. This is from a talk on his web site. Caveat: I don't have a cell to play with (yes Santa, I would like 1 or 2 hundred), so I can't run paranoia or other fun tests.


You need "really big" volumes to get there. Retail pricing of $200 implies a bill of materials cost down in the sub $20 range.

Yup. Volume drives lower pricing. Economies of scale matter. This is why FPGAs are where they are price wise. They don't have large volumes. If they did, pricing should be better.

Considering that a run of the mill ASIC spin costs >$1M (for a small number of parts produced), your volume has to be several hundred thousand (or a million) before you even cover the cost of your development.

The video card folks can do this because
a) each successive generation of cards is derived from the past, so the NRE is lower.. most of the card (and IC) is the same

I believe they are in incremental improvement mode. This keeps redesign costs way down.

b) they have truly gargantuan volumes

This is the critical thing. Remember, these are highly pipelined graphical supercomputers. The ClawHMMer project ran a hardware accelerated HMMer on an nVidia GT6800 5x faster than the P4 hosting the card.

c) they have sales from existing products to provide cash to support the development of version N+1.

Cash is king.

{I leave aside the possibility of magic elves, although with some consumer products, I have no idea how they can design, produce, and sell it at the price they do. Making use of relative currency values can also help, but that's in the non-technological magic elf category, as far as I'm concerned.}

Actually lots of stuff is done outside the US these days. Not magic elves per se, but Indian and Chinese engineers and scientists who are extremely good at what they do. This starts getting into a cost and productivity discussion rather rapidly.

The possibly most interesting niche for the Clearspeed cards appears to me accelerating proprietary applications like Matlab, Mathematica and particularly Excel that run on a single PC and that can hardly be reprogrammed by their
users to run on a distributed cluster.



I would say that there is more potential for a clever soul to reprogram the guts of Matlab, etc., to transparently share the work across multiple machines. I think that's in the back of the mind of MS, as they move toward a services environment and .NET

:)

So imagine if you will an LD_PRELOAD environment variable which points a users code over to the relevant libraries which work their magic behind the scenes. I would be hard pressed to imagine using this for Excel, but could see it for Matlab. Programming at high levels with high performance. Of course Kahan also rips into them over accuracy ...


Jim



--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to