http://www.cse.scitech.ac.uk/disco/novel_arch.shtml
Novel Architectures
As
microprocessor manufacturers are finding it increasingly difficult to
keep up with Moore's law the HPC community are beginning to look
seriously at the potential performance gains promised by a number of
novel architectures. This page provides an overview of some of these
architectures and presents some early performance comparisons between
these novel approaches and more conventional processors.
|
|
FPGAs
These
reconfigurable processors require significantly less power than
conventional processors and could significantly increase compute
density in HPC systems. As FPGAs provide access to completely
reconfigurable logic the potential performance increases that they
offer are huge. Performance boosts of well over 100X have been reported
for certain applications when compared with conventional processors.
The trouble is that not all applications are well suited to FPGAs. This
is particularly the case for double precision floating point intensive
applications as large amounts of logic are consumed by basic double
precision floating point cores. The devices are also very difficult to
program efficiently without extensive hardware design experience. A
number of HPC vendors including Cray and SGI now produce systems that
are designed to accomodate FPGAs as co-processors. For more information
about FPGAs visit our FPGA page.
|
|
Cell
The
Cell Broadband Engine from IBM/Toshiba/Sony has been designed primarily
for the Sony Playstation 3 games console and will therefore be produced
in very large volumes. The hope is that this will make the Cell an
affordable option for large HPC systems. The Cell processor itself is
made up of nine processors operating on a shared, coherent memory. The
first generation of Cell has a single Power Architecture-based control
processor (PPU) and eight SIMD Synergistic Processor Units (SPUs) but
different configurations are likely to emerge.
IBM's Cell Broadband Engine resource center
can be found here - http://www-128.ibm.com/developerworks/power/cell/
|
|
Clearspeed
ClearSpeed
produces the Advance Accelerator PCI-X and PCIe boards which work by
offloading compute-intensive math library routines called by
applications running on the host processor. Clearspeeds website reports
that it's CSX600 co-processor provides 25 GFLOPS of sustained single or
double precision floating point performance, while dissipating a
maximum of 10 Watts (25 Watts per board). The CSX600 is a
system-on-a-chip (SoC) with a predefined functionality that cannot be
reconfigured (like an FPGA can) but what the chip loses to FPGAs in
flexibility it more than makes up for in usability as applications that
already make use of standard math libraries (level 3 BLAS and FFTW)
should work on these cards without the need to port code.
More
recently ClearSpeed have introduced so-called CATS units which have
twelve boards packed into one 1U server. We've got two CATS attached to
our cseem64t cluster.
|
|
General-Purpose
computation on GPUs
With
the increasing programmability of commodity graphics processing units
(GPUs), these chips are now considered to be useful for performing more
than the specific graphics computations for which they were designed.
They are now seen by some as capable coprocessors, useful for a variety
of applications including scientific computing.
http://www.gpgpu.org/
catalogs the current and historical use of GPUs for general-purpose
computation.
|
 |
DGEMM Performance
This bar chart provides an indication of DGEMM performance in
Gflop/s for a number of conventional and novel architectures.

The
Cell (simulation), Virtex II Pro FPGA and Cray X1E vector processor all
achieve in the region of 15 Gflop/s sustained DGEMM performance. This
is two to three times the performance offered by current Itanium and
Opteron processors. It is likely that the latest Xilinx Virtex 4 and
Virtex 5 FPGAs would be able to significantly outperform the Virtex II
Pro (Approximately 3X speedup on the largest chips). The Clearspeed
CSX600 processor provides almost double the DGEMM performance of Cell
(25Gflop/s sustained) but you would expect this from a dedicated
floating point co-processor when compared to more general purpose
chips. Finally the Cell+ is an optimized version of the Cell
architecture proposed by a team at Lawrence Berkley National Laboratory
in the US. Simulations based on a performance model for the Cell+
indicated that it could achieve 51 Gflop/s for DGEMM.
Data for the Cell and Cell+ are performance
predictions taken from The
Potential of the Cell Processor for Scientific Computing,
Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry
Husbands, Katherine Yelick, May, 2006. http://www.cs.berkeley.edu/~samw/projects/cell/CF06.pdf
|
|