The shipment of this accelerator card has been delayed many times. Last time
I asked was October 2005.   Apparently the first shipment has been made this
month for a Japanese supercomputer with 10^4 Opterons.   The cost is not
indicated, but something like above $8000.- per card would put it outside
commodity hardware.  I wouldn't be astonished that more performance can
be obtained in most applications with commodity clustering.

If Clearspeed would consider mass production with a cost like $100.-$500.-
per card the market would be huge, because the card would be competing with
multi-core processors like the IBM-Sony Cell.

The possibly most interesting niche for the Clearspeed cards appears to me
accelerating proprietary applications like Matlab, Mathematica and particularly
Excel that run on a single PC and that can hardly be reprogrammed by their
users to run on a distributed cluster.

Dan


Bill Broadley wrote:
I noticed a few news reports on Intel/AMD considering the Clearspeed
co-processor.

Looks like a fairly interesting widget, here's an Intel/Clearspeed paper
that describes it:
http://www.clearspeed.com/downloads/Intel%20Math%20Kernel%20whitepaper.pdf

Some interesting snippets on the Clearspeed advance board:
* 192 pipelines, 2 flops per clock (not fused), 250 MHz, peak 96GFlops
  (I believe this is for 2 chips)
* 50 GFlops sustained with the DGEMM kernel
* 1 GB of ram per board.
* 128 registers per PE, register file allows 3 reads 2 writes per clock
* 1.44 MB of SRAM that can deliver one word per FP op per clock.
* 800MB/sec over pci-x, enough for 50 GFlops on DGEMM.
* Less than 10 watts while sustaining 25 GFlops
* 1-D complex FFTs of 1024 elements @ 400k per second (20 GFlops with 32-bit),
  but only 1/4th of that streaming because of pci-x bottlenecks.
* 12 GFlops when running 2-d FFTs (512x512 single precision) that are
  resident on board (in the 1GB)

In any case it looks like an interesting development.

Speaking of which, what is the double precision peak rate of today's p4 and opteron? One 128 bit SSE operation every other cycle (so 1 64 bit
flop per cycle)?  I believe Intel mentioned doubling this rate at IDF
(shipping sometime in the 2nd half of this year).

_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to