Al Boldi wrote: > Christian Budde wrote: > > I'm not into FPC, but this looks like if the compiler doesn't put the > > variable(/branch address) on an address which is dividable by 16. On > > some Intel processors, that makes a huge difference, especially in case > > of branches. > > So you are saying 'var i:double' has to be on an address dividable by 16 > for some Intel processors. > > I found this: > Pentium-MMX 0% slowdown > PentiumII 222% slowdown > Pentium4 333% slowdown > > Can somebody post an asm that more easily shows this problem?
Ok, http://www.emulators.com/docs/pentium_1.htm says this: Finally that dreaded partial register stall! The one serious bug in the P6 design that can cause legacy code to run slower. By "legacy code" I mean code written for a previous version of the processor. < snip > While every other optimization in the P6 family pretty much boosts performance without requiring the programmer to rewrite one single line of code, even the 4-1-1 decode rule, the register renaming optimization has one fatal flaw that kills performance: partial registers stalls! A partial register stall is when a partial register (that is, the AL, AH, and AX parts of the EAX register, the BL, BH, and BX parts of the EBX register, etc) get renamed to different internal registers because the processor believes the uses are mutually exclusive. < snip > This is perfectly valid code, and runs perfectly fine on the 486, Pentium classic, and AMD processors, but suffers a partial register stall on any of the P6 processors. On the Pentium Pro a stall of about 12 clock cycles, and on the newer Pentium III about 4 clock cycles. Why does the partial register stall occur? Because internally the AL register and the EAX registers get mapped to two different internal registers. The processor does not discover the mistake until the second micro-op is about to execute, at which point it needs to stop and re-execute the instruction properly. This results in the pipeline being flushed and the processor having to decode the instructions a second time. How to solve the problem? Well, Intel DID tell developers how to avoid the problem. Most didn't listen. The way you work around a partial register stall is to clear a register, either using an XOR operation on itself, a SUB on itself, or moving the value 0 into the register. (Ironically, SBB which is almost identical to SUB, does not do the trick!) Using one of these three tricks will flag the register as being clear, i.e. zero. This allows the second use of the instruction to be mapped to the same internal register. No stall. < snip > --end of quote Florian, what's the status on FPC? Thanks! -- Al _________________________________________________________________ To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe" as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
