Re: Mersenne: P4

Brian J. Beesley Thu, 30 Nov 2000 01:14:34 -0800
On 29 Nov 00, at 20:31, Jason Stratos Papadopoulos wrote:

> > Humm, too much slowdown for single Ia32 instructions, Intel engineers
> > will know the reasons.

Presumably something to do with not being parallel architecture; 
there's only one set of flags, so you can't start an ADC until the 
previous instruction has completed and the flags register has been 
retired. In IA32, there's nothing to tell you _which_ carry flag to 
use in an instruction which uses carry as an input, or output.
> 
> This and the 14-18 clock multiply are extremely depressing. For all the
> marketing hype about Intel catering to the internet and the next
> generation of "active media", they're making it awfully difficult to
> implement the cryptography that the internet needs.

Well, you're not _forced_ to buy Intel - even if you stick to 
Windoze! I get the impression that, with the Athlon architecture, AMD 
are making serious inroads into the market - especially at the lower 
end. Now it just so happens that the Athlon architecture runs Prime95
rather well.
> 
> The alpha was already at least 5x faster than a PIII for multiprecision
> arithmetic at the same clock speed; with the P4 it will only get worse.

Are you sure about this? I think, with Alpha, you have to execute the 
instruction twice to get a double-precision multiplication - you can 
store either the low half or the high half of a product in one 
instruction, but not both.

Of course, the Alpha's single precision integer arithmetic is 64 
bits, not 32. This does help somewhat :)

It's also easy to get confused between the latency (time between 
loading the opcode and finishing execution) and the throughput (the 
inverse of the longest time an instruction spends in any particular 
unit, times the number of the critical unit involved in the design).

The fact remains that the P4, like all other commercial processors, 
is a compromise. This is what makes it so darned complicated, and 
also indicates why the designers have to make performance tradeoffs, 
some of which don't suit our particular application. It would be 
possible to optimize the design so that it executed the instructions 
_we_ find useful much more quickly, but whether such a processor 
would be capable of running standard commercial benchmarking programs 
at a reasonable speed - or indeed at all - is open to question ... 
we'd probably want to reduce the inherent complexity by junking the 
16-bit x86 legacy, for a start...


Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: P4

Reply via email to