On Wed, Dec 16, 2009 at 4:33 PM, Gabriel Michael Black <[email protected]> wrote: > Quoting Vince Weaver <[email protected]>: > >> On Wed, 16 Dec 2009, Steve Reinhardt wrote: >> >>> On Sun, Dec 13, 2009 at 8:57 PM, Vince Weaver <[email protected]> wrote: >>> > I did finish running and verifying spec2k on x86_64 (it took longer than >>> > it should have due to an unfortunate power-outage on our cluster). The >>> > benchmarks all finished, and the retired instruction count matches actual >>> > hardware perf counters very closely. >>> > >>> > http://www.csl.cornell.edu/~vince/projects/m5/m5_x86_64_se_status.html >>> >>> Wow, this is awesome! I missed this the first time through (didn't >>> scroll down to the end of the message). Thanks for all the effort, >>> Vince. >>> >>> Are you tracking uops as well as instructions? I'm curious how close >>> we are on that. >> >> uops for m5 are currently about 1.5x too many, when compared to AMD Phenom >> and Intel Core2 (slightly better, but not much, when compared against a >> Pentium D). >> >> It's slightly worse than 1.5 on integer spec2k and slightly better on fp.
Thanks for the info... I expected we would not be as close there. >> >> uops are tricky to get right, I imagine the values will be off unless you >> carefully use perf-counters and other tricks (or else have inside >> knowledge) to match real hardware. And even then, you'd only match a >> particular x86 imlementation, there's wide variation between the various >> generations. Yes, this is one of the reasons that when I was discussing with Gabe what enhanced features could be added to the ISA description I suggested that some way of auto-generating the uop flows from a set of templates would be useful; that way you could model different approaches without having to rewrite the whole microcode. My recollection is that for the most part (of course with numerous exceptions) the bulk of the microcode is really due to the cross-product of operations and operand types, so having some templates that capture how different operand types are handled and then being able to automatically plug in different operations would significantly reduce the amount of manual microcode generation, and allow you to do something like "what if we handled this type of operand differently" and automatically apply it across the whole ISA just by changing a template or two (ideally). >> I think PTLSim goes through a lot of trouble to make their >> uop counts match an AMD system, but I don't know how close they manage to >> get. >> >> besides retired instructions, m5 also does a good job (compared to real >> hardware) with L1 dcache accesses. I was hoping to validate some of the >> other stats, but it's hard to do that with OoO and detailed simulation not >> supported on x86. >> >> Vince > > I've been thinking about this since reading your email, and it occurs > to me the microops may be loads, ops, stores, or opstores and still > roughly fall into a RISC style architecture. Stores have to wait > around in the store queue anyway, so they could wait for their data to > be generated by the ALU without a significant penalty. The most common > sort of macroop is a load/op/store where one operand is in memory. In > those cases, if you merge the op and the store, you'd go from 3 ops to > 2, explaining (in this simplified version of the world) the 1.5x > difference. If you look at the SSE instructions, this sort of single > memory operation and computation merging is how a lot of them are > organized, although perhaps loadops instead of opstores (I forget the > details). My general impression is that we go too far in terms of saying "here's a minimalistic RISC-like set of uops and I'm going to build everything out of that" as opposed to "if I'm going to implement x86, what uops should I implement to be able to do that efficiently". You may well be right about at least one source of the uop bloat, but it would be better at this point to have some hard evidence of which specific instructions or types are causing the increase. Steve _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
