"distcc" <[email protected]> wrote in message news:[email protected]... > nedbrek Wrote: >> "Walter Bright" <[email protected]> wrote in message >> news:[email protected]... >>> nedbrek wrote: >>>> Also, "macro op fusion" allows you can get a branch along with the last >>>> instruction in decode, potentially giving you 5 macroinstructions per >>>> cycle from decode. Make sure it is the flags producing instruction >>>> (cmp-br). >>>> >>> >>> I can't find any Intel documentation on this. Can you point me to some? >> >> The best available source is the optimization reference manual >> (http://www.intel.com/products/processor/manuals/). The latest version >> is >> 248966.pdf, which mentions "Decodes up to four instructions, or up to >> five >> with macro-fusion" (page 33). Also, page 36: "Macro-fusion merges two >> instructions into a single ?op. Intel Core microarchitecture is capable >> of >> one macro-fusion per cycle in 32-bit operation". It's unclear if macro >> fusion is off entirely in 64 bit mode, and whether this has changed in >> more >> recent processors... > > I remember reading that macro fusion is entirely off in 64 bit mode in > Nehalem > and earlier generations, and supported in Sandy Bridge. > > When generating code for loops, the compiler could also make use of Loop > Stream > Coder to avoid i-cache misses.
Serves me right, it is a little further in, page 52: "In Intel microarchitecture (Nehalem) , macro-fusion is supported in 64-bit mode, and the following instruction sequences are supported: (big list)". That would leave it off of 65nm (Merom) and 45nm (Penryn) parts. These are identifiable through CPUID. The guide is broken up into sections based on the particular chip, so you end up having to read them all to get a general feel for things... Ned
