drew writes:
>My best guess (if it isn't a bug) would be that it's there to keep the stack 
>on a 32 byte (IIRC, this sounds like cache line size for the newer 
>Intel chips) 

This discussion piqued my curiosity, so I popped up the Pentium III
optomization manual.  To quote it:

    On Pentium II and Pentium III processors, a misaligned access that
    crosses a cache line boundary does incure a penalty.  A Data Cache
    Unit (DCU) split is a memory access that crosses a 32-byte line boundary.
    Unaligned accesses may cause a DCU split and stall Pentium II and Pentium
    III processors.  For best performance, make sure that in data structures
    and arrays greater than 32 bytes, the structure or array elements
    are 32-byte-aligned and that access patterns to data structure
    and array elements do not break the alignment rules.

IOW, the stack pointer adjustment is there so that doubles 
(and 80-bit floats, if GCC supports those.  Does it do a long double
for Intel targets?) in the called function don't cross a 32 byte data 
cache line boundary.

>Two instructions instead of 1 would help to facilitate alignment of the 
>return address (I think 16 bytes is a good alignment for a jmp, and 
>I can't see why a ret wouldn't the same)

PII/PIII processors prefetch on 16 byte boundaries, so having
the return address on such a boundary may cut the number of prefetches
and (marginally) improve performance.

-- 
<a href="http://www.poohsticks.org/drew/">Home Page</a>
For those who do, no explanation is necessary.  
For those who don't, no explanation is possible.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Reply via email to