The decode cache is used for x86. Basically, the key has to be the extended machine instruction to ensure that all the context that determines what instruction you really want gets included. In the other ISAs this is just an integer, but in x86 it's a struct (or class, I forget) which is a fixed representation of an instruction with the various widths, processor mode, opcode length, significant opcode byte, etc. stored in there. That caches the step between the ExtMachInst and what is usually a macroop, and the macroop has all the microops it needs in an array inside itself so they're effectively cached too. That has the nice side effect that the decode cache not only works, but when there's a hit you could potentially get lots of microops for only one lookup.
What I was thinking was to try to template the decoder and/or macroops and/or microops based on the different widths they're going to use. There are only a few choices for those, and that way the instructions wouldn't have to make a decision based on that every time. I don't know how much execution time is spent on that though. It may not be significant. One big concern I have is that you'd probably need to make the macroops templated in order for this scheme to work and there are a lot of those. If there would need to be 9 copies of each, for instance, that would get to be fairly unmanageable. With the microops that wouldn't be as bad since there aren't as many of those. Gabe Quoting Steve Reinhardt <[email protected]>: > Given that the base decode cache is just a hash table with no size cap and > no replacements, I'd guess the hit rate is extremely high. > > Even in the unlikely event that we discover the decode cache is not as > effective as I believe it is, I would still encourage putting our effort > into making the cache more effective and not into making the regular decoder > faster. > > Of course, this is all based on my RISC experience... do you even use the > decode cache for x86? If not, then the obvious project is to make it work > for variable-length instructions. > > Steve > > On Thu, Feb 5, 2009 at 12:45 AM, Gabe Black <[email protected]> wrote: > > > Have we ever measured a ball park hit rate for our decode cache? I'm > > wondering how much work the decoder should do to make the instructions > > themselves faster and the effectiveness of the decode cache probably > > plays a large role in that. > > > > Gabe > > _______________________________________________ > > m5-dev mailing list > > [email protected] > > http://m5sim.org/mailman/listinfo/m5-dev > > > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
