The more I look at this, the more it seems like there needs to be a way to discard the current macroop and start executing out of the ROM as the instruction itself. From a practical perspective, this would be more efficient than constantly checking where a microop should come from on every fetchMicroop call. The microop would come from the right place all the time.
Second, it simplifies control flow to the ROM. Right now, the ROM and the regular microops live in the same micropc address space, and they're distinguished by putting the ROM at high addresses. This is a problem since the micropc branch microop only supports an 8 bit immediate value for a new absolute micropc. Even if it was relative, that would mean -all- ROM targets had to be within 255 microops of the combination micropc space which wouldn't work out very well. Third, it's more like the actual hardware where the combinational macroop does a little bit to either do the whole instruction or get out of the way, and then the ROM takes over and does it's thing. Fourth, this makes the problem of handling faults without an actual macroop easier, since it might work out that you can just go to the ROM directly. I'm not sure how well this will work out when it's all said and done, but it is an attractive idea. This does mean, however, that the problems of caching microops that have been specialized to the environment of the original instruction, operand size, blah blah, are still there. It's still going to be hard to keep track of them in a sensible way. In a real machine this is easier since you're microops are really microop templates which are then specialized for their context. Also, it means there would need to be a new mechanism in the decoder which would allow the macroops to switch themselves out for a ROM when needed. I've been toying with two ideas for how to make this happen. The first is that there would actually be two decoders. The first would be a decoder which would go to either a stream of microops actually encoded in 1s and 0s, or bring in a stream of 1s and 0s from an address space external to the CPU, aka an actual ROM. The other decoder would go from that to a series of actual microop StaticInsts which would get processed by the remainder of the CPU. I don't like this very much because it has a lot of overhead and would complicate things for ISAs that don't want to use it. The other idea would be to make the microcode ROM a first class citizen in the CPU, as apposed to a conceptual entity in the X86 microcode stuff, and to add a function to the threadcontext which would change the mode of the decoder (or whatever else is splitting out microops from macroops) to start using the ROM instead. I'd love to hear what other people think about this, so please don't hesitate to let me know. I realize my explanation might be hard to follow without being as familiar with this stuff as I am (or maybe because I didn't explain it well?), so if you'd like any clarification or a longer explanation please let me know that too. Gabe Gabe Black wrote: > In addition to needing a way to get -to- the microcode in the ROM, > another issue to work out is how the microops in the ROM are > represented, constructed, and returned to the decoder. An important > decision that that hinges on is whether or not microops in the ROM will > be specialized for the operand size, address size, target registers, > etc. of the originating instruction like combinationally generated > microops are. In a real system they would be. For the interrupt entering > microcode this shouldn't be a concern since the process is basically > independent of any particular instruction or mode of operation, except > at the global level. In other words, you do something different enough > in each mode to have its own implementation, but each implementation > works in only one way. The key difference this makes is whether or not > the microops can be generated once and kept around forever, or if a new > StaticInst needs to be generated for every different environment the > microops are executed in. > > > Assuming the microops are specialized, which is the more useful, > realistic and difficult option, several other questions come up. The > first is exactly how the ROM, which I'm envisioning making a static > member of the base macroop class, will generate the microops it returns > to the macroop/decoder. Unlike the case of macroops which statically > allocate their microops at construction and store them in an array, the > ROM will need to have many versions of the same microop around at the > same time. This means that the micropc isn't indexing into StaticInst > objects, it's really indexing into StaticInst classes which it then > needs to instantiate and specialize against the environment of the > original macroop. > > Assuming there's a good indexing mechanism to get the right class > built based on a micropc, the next issue is performance. If we're > reallocating these microops every time their executed, we might kill > some of the performance gain we're getting from the decode cache, less > the decode itself. We would want a cache of some sort which would be > index on the emulation environment of a particular macroop and the > micropc of the microop it was referencing. This is less of a concern > because getting it working is the first priority, and also since these > are exceptions, even by name, they hopefully don't happen very often and > wouldn't affect performance hugely. > > If anybody has any opinion or suggestion on this or the earlier > email about microcode, please let me know. This is the next hurdle to > get over for x86. > > Gabe > _______________________________________________ > m5-dev mailing list > m5-dev@m5sim.org > http://m5sim.org/mailman/listinfo/m5-dev > _______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev