AMD has officially posted the errata for the cpu bug I found! It is errata #721 and can be found here:
http://support.amd.com/us/Processor_TechDocs/41322_10h_Rev_Gd.pdf The errata includes a MSR workaround. I tested the MSR workaround and it does appear to fix my test case, and I saw no discernable difference in performance. I would like to thank the folks at AMD who dilligently tracked the bug down based on the test case I provided, and I would like to thank all the supportive emails. -- With the bug now known and, also, that a MSR workaround is available, one of my developers will be posting a simplified test case here too. He's been chomping at the bit :-) But I asked him to wait until AMD posted their MSR workaround. One last thing I would like to note: Because of the instant nature of communication these days I was taken a bit by surprise by how quickly misinformation about the bug spread. I take responsibility for this because I simply did not post enough information in my original missive after AMD confirmed the bug. We want cpu vendors to feel that they can communicate with developers to the mutual benefit of both, so I feel quite badly about it. It is important that people keep in mind that there *IS* a MSR workaround for this bug and that also, despite the fact that it occurs using normal instruction sequences, this bug is quite difficult to reproduce in real-life scenarios. It isn't just a simple sequence of instructions... it requires a very deep recursion and particular stack alignment. The stars have to align, basically. We probably would have never found the bug if DragonFly hadn't had user stack randomization turned on by default :-). And w/ respect to GCC, we tend to use a mid-level of optimization rather than a high-level of optimization and, frankly, I was never able to reproduce the bug with any version of GCC other than the *particular* binary from late last year, and only at particular starting stack offsets. We have never observed this particular bug outside of the GCC test case and the simplified test case that will soon be posted. So, again: * There is a MSR workaround (program MSR C001_1029 bit 0 to a 1) * No discernable performance loss after programming the MSR * All rev 10h cpus are effected, phenom and opteron. * (not sure about 12h). * Bulldozer is NOT affected by the bug. Thank you all, -Matt