The more I look at this, the more it seems like there needs to be a way 
to discard the current macroop and start executing out of the ROM as the 
instruction itself. From a practical perspective, this would be more 
efficient than constantly checking where a microop should come from on 
every fetchMicroop call. The microop would come from the right place all 
the time.

Second, it simplifies control flow to the ROM. Right now, the ROM and 
the regular microops live in the same micropc address space, and they're 
distinguished by putting the ROM at high addresses. This is a problem 
since the micropc branch microop only supports an 8 bit immediate value 
for a new absolute micropc. Even if it was relative, that would mean 
-all- ROM targets had to be within 255 microops of the combination 
micropc space which wouldn't work out very well.

Third, it's more like the actual hardware where the combinational 
macroop does a little bit to either do the whole instruction or get out 
of the way, and then the ROM takes over and does it's thing.

Fourth, this makes the problem of handling faults without an actual 
macroop easier, since it might work out that you can just go to the ROM 
directly. I'm not sure how well this will work out when it's all said 
and done, but it is an attractive idea.

This does mean, however, that the problems of caching microops that have 
been specialized to the environment of the original instruction, operand 
size, blah blah, are still there. It's still going to be hard to keep 
track of them in a sensible way. In a real machine this is easier since 
you're microops are really microop templates which are then specialized 
for their context. Also, it means there would need to be a new mechanism 
in the decoder which would allow the macroops to switch themselves out 
for a ROM when needed.

I've been toying with two ideas for how to make this happen. The first 
is that there would actually be two decoders. The first would be a 
decoder which would go to either a stream of microops actually encoded 
in 1s and 0s, or bring in a stream of 1s and 0s from an address space 
external to the CPU, aka an actual ROM. The other decoder would go from 
that to a series of actual microop StaticInsts which would get processed 
by the remainder of the CPU. I don't like this very much because it has 
a lot of overhead and would complicate things for ISAs that don't want 
to use it.

The other idea would be to make the microcode ROM a first class citizen 
in the CPU, as apposed to a conceptual entity in the X86 microcode 
stuff, and to add a function to the threadcontext which would change the 
mode of the decoder (or whatever else is splitting out microops from 
macroops) to start using the ROM instead.

I'd love to hear what other people think about this, so please don't 
hesitate to let me know. I realize my explanation might be hard to 
follow without being as familiar with this stuff as I am (or maybe 
because I didn't explain it well?), so if you'd like any clarification 
or a longer explanation please let me know that too.

Gabe

Gabe Black wrote:
>     In addition to needing a way to get -to- the microcode in the ROM, 
> another issue to work out is how the microops in the ROM are 
> represented, constructed, and returned to the decoder. An important 
> decision that that hinges on is whether or not microops in the ROM will 
> be specialized for the operand size, address size, target registers, 
> etc. of the originating instruction like combinationally generated 
> microops are. In a real system they would be. For the interrupt entering 
> microcode this shouldn't be a concern since the process is basically 
> independent of any particular instruction or mode of operation, except 
> at the global level. In other words, you do something different enough 
> in each mode to have its own implementation, but each implementation 
> works in only one way. The key difference this makes is whether or not 
> the microops can be generated once and kept around forever, or if a new 
> StaticInst needs to be generated for every different environment the 
> microops are executed in.
>
>    
>     Assuming the microops are specialized, which is the more useful, 
> realistic and difficult option, several other questions come up. The 
> first is exactly how the ROM, which I'm envisioning making a static 
> member of the base macroop class, will generate the microops it returns 
> to the macroop/decoder. Unlike the case of macroops which statically 
> allocate their microops at construction and store them in an array, the 
> ROM will need to have many versions of the same microop around at the 
> same time. This means that the micropc isn't indexing into StaticInst 
> objects, it's really indexing into StaticInst classes which it then 
> needs to instantiate and specialize against the environment of the 
> original macroop.
>
>     Assuming there's a good indexing mechanism to get the right class 
> built based on a micropc, the next issue is performance. If we're 
> reallocating these microops every time their executed, we might kill 
> some of the performance gain we're getting from the decode cache, less 
> the decode itself. We would want a cache of some sort which would be 
> index on the emulation environment of a particular macroop and the 
> micropc of the microop it was referencing. This is less of a concern 
> because getting it working is the first priority, and also since these 
> are exceptions, even by name, they hopefully don't happen very often and 
> wouldn't affect performance hugely.
>
>     If anybody has any opinion or suggestion on this or the earlier 
> email about microcode, please let me know. This is the next hurdle to 
> get over for x86.
>
> Gabe
> _______________________________________________
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>   

_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to