On Mon, Jul 25, 2016 at 01:46:43PM -0700, Guy Sotomayor Jr wrote: >> On Jul 25, 2016, at 1:34 PM, Sean Conner <[email protected]> wrote: >> It was thus said that the Great Peter Corlett once stated: >>> Unsurprisingly, the x86 ISA is brain-damaged here, in that some >>> instructions (e.g. inc") only affect some bits in EFLAGS, which causes a >>> partial register stall. The recommended "fix" is to avoid such >>> instructions.
>> I'm not following this. On the x86, the INC instruction modifies the >> following flags: O, S, Z, A and P. So okay, I need to avoid INC to prevent >> a partial register stall, therefore, I need to use ADD. Let me check ... >> hmm ... ADD modifies the following: O, S, Z, A, P and C. So now I need to >> avoid ADD as well? I suppose I could use LEA but then there goes my bignum >> addition routine ... >> -spc (Or am I missing something?) Yes, in that I was taking a potshot at x86's expense, and skipped the technical details because contemporary x86 architecture is seriously off-topic. But since I've now been asked... > No Peter is wrong. All of the modern x86 (at least the Intel CPUs) are OOO > machines with large register files (192 comes to mind) that do register > renaming to map the register(s) used by a particular instruction back into an > “architectural” register (no copy is actually done). The flags register is > also part of the register re-naming. The only stalls occur when one > instruction needs the results from an instruction that hasn’t committed it’s > results yet (ie the instruction is still in “flight”). It is the *partial* update that's key. If you do an INC and then read EFLAGS or execute an instruction such as JBE that needs C and some other flag(s), the information has to be derived from *two* renamed registers. This typically involves an extra micro-op in the instruction stream to do this fixup, although the details will obviously vary by CPU model. But I'm only repeating this information from the experts, so if you still think I'm wrong, read their reference material: http://www.agner.org/optimize/microarchitecture.pdf is Agner Fog's optimisation guide with more detail that mere mortals really need. Page 154 covers this for the latest Skylake CPUs and uses INC in its example. http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html is Intel's own optimisation guide. Section 3.5.2.6 discusses partial flag register stalls, although it doesn't specifically mention INC. This stuff is way more complex than any normal person can keep in their head. It's possible to learn all the edge cases and avoid the performance hit in hand-written assembly, but it's a lot easier to just give it to the compiler to puzzle out. That's its job. Can we now go back to talking about interesting CPUs? :)
