Al Boldi wrote:
> Christian Budde wrote:
> > I'm not into FPC, but this looks like if the compiler doesn't put the
> > variable(/branch address) on an address which is dividable by 16. On
> > some Intel processors, that makes a huge difference, especially in case
> > of branches.
>
> So you are saying 'var i:double' has to be on an address dividable by 16
> for some Intel processors.
>
> I found this:
>   Pentium-MMX  0% slowdown
>   PentiumII  222% slowdown
>   Pentium4   333% slowdown
>
> Can somebody post an asm that more easily shows this problem?

Ok, http://www.emulators.com/docs/pentium_1.htm says this:

Finally that dreaded partial register stall! The one serious bug in the P6 
design that can cause legacy code to run slower. By "legacy code" I mean 
code written for a previous version of the processor.

< snip >

While every other optimization in the P6 family pretty much boosts 
performance without requiring the programmer to rewrite one single line of 
code, even the 4-1-1 decode rule, the register renaming optimization has one 
fatal flaw that kills performance: partial registers stalls! A partial 
register stall is when a partial register (that is, the AL, AH, and AX parts 
of the EAX register, the BL, BH, and BX parts of the EBX register, etc) get 
renamed to different internal registers because the processor believes the 
uses are mutually exclusive.

< snip >

This is perfectly valid code, and runs perfectly fine on the 486, Pentium 
classic, and AMD processors, but suffers a partial register stall on any of 
the P6 processors. On the Pentium Pro a stall of about 12 clock cycles, and 
on the newer Pentium III about 4 clock cycles.


Why does the partial register stall occur? Because internally the AL register 
and the EAX registers get mapped to two different internal registers. The 
processor does not discover the mistake until the second micro-op is about 
to execute, at which point it needs to stop and re-execute the instruction 
properly. This results in the pipeline being flushed and the processor 
having to decode the instructions a second time.


How to solve the problem? Well, Intel DID tell developers how to avoid the 
problem. Most didn't listen. The way you work around a partial register 
stall is to clear a register, either using an XOR operation on itself, a SUB 
on itself, or moving the value 0 into the register. (Ironically, SBB which 
is almost identical to SUB, does not do the trick!) Using one of these three 
tricks will flag the register as being clear, i.e. zero. This allows the 
second use of the instruction to be mapped to the same internal register. No 
stall.

< snip >

--end of quote

Florian, what's the status on FPC?

Thanks!

--
Al

_________________________________________________________________
     To unsubscribe: mail [EMAIL PROTECTED] with
                "unsubscribe" as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives

Reply via email to