Mersenne: architecture comments

brad Sun, 1 Nov 1998 23:46:33 -0500
The important difference between CISC and RISC machines is that
CISC machines are microcoded and RISCs are not.  With transitor
counts on the order of 10M, all machines are complex today.

The legacy design philosophy of RISC is to maximize the utilization
of transistors.  From this we can dispell the myth that the evolution
of architecture is ever wider paths and more registers and other
resources.  Whenever we add more of one resource there is a trade-off
of less of another resource.  Also, as a resource is increased, the
returns diminish.

The Alpha and i64 (Merced) have lower instruction densities than is
optimal.  Alpha uses about 50% too many bits to encode an instruction
stream and it looks like the i64 will use about two times too many.
Utilization of I-cache and chip pins suffer with low instruction
density.

The low instruction density of the i64 is the result of the LIW
approach and having too many registers (128).  Alpha has too few
registers (32).  The optimal number is 64 to satisfy the trade-off
between register pressure and instruction density.

One aspect of the i64s EPIC approach is the encoding of up to
3 operations per instruction.  This they got right and targets
a major weakness of the Alpha.  The Alpha must resort to complex
4 way and 8 way instruction issue schemes.  The scheduling logic
is a poor use of transistors.

The most overlooked flaw in machines is the depth of the instruction
pipe.  The Alpha has a short clock period; resulting in impressive
MHz specs.  This advantage is lost in the additional stages required
in the instruction pipe to support the short clock.  The i64 has the
potential for a short pipe, but I don't have enough information yet.

Regarding 64 bit addresses; this is hardley more than marketing
hype.  If needed, the address space can extended be more effectively.
Splitting instruction and data into separate spaces buys more space
cheaply.  If still more is needed, a second data space can be added
with 64 bit data access yielding an additional 64GB.

Finally, the LL test just needs more multipliers.  This can be had
with wider data paths, pipelines multipliers, or parallel multiplies.
Wide data paths is probably the least effective means of the three.
Mersenne: architecture comments

Reply via email to