Steve Reinhardt wrote:
> Sorry for the delay in responding... every time I came around to it 
> the overhead of reading through the accumulated thread seemed more and 
> more imposing.  Even after I've read through it all I'm sure I don't 
> quite understand all the issues in detail, but I'll comment anyway.
>
> I agree that trying to store the ucode ROM in some binary format and 
> then decoding it on the fly doesn't make much sense.  If we store the 
> ucode directly as a set of StaticInsts (or other objects that generate 
> specialized StaticInsts) then we've eliminated the 
> machine-code-to-StaticInst translation anyway, so not being able to 
> cache the resulting StaticInsts may not be a big deal.
>
> Perhaps we can extend the StaticInst class (or create a subclass) 
> which has a specialize() virtual method that reads state out of the 
> thread context to (potentially) generate a new StaticInst?  On the 
> other hand, in general the idea of StaticInst objects that aren't 
> truly static bothers me... it would be nice if the StaticInst could 
> truly be static, and then the specialized information could be read 
> out of the thread context on demand, with something like a DynamicInst 
> used to cache the specialized version as necessary (basically treating 
> the specialization the way we handle register renaming.)  The bad 
> thing is that we'd then need DynamicInst objects even for SimpleCPU, 
> which is a big change.  I'm not really pushing anything specific, just 
> voicing my concerns in case it triggers some ideas.
>
> Can we just use a bit in the micropc to indicate ROM mode vs 
> combinational mode?  I don't get Gabe's comment about micropc-relative 
> branches... in what case would you have branches between the two modes 
> other than the single absolute branch you need to go from 
> combinational to ROM mode?
>
> Steve
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>   
I hadn't considered that the decode function could be a dominant factor 
in the decode overhead. How much time do you think we spend actually 
allocating a StaticInst itself? In any case, it won't be as bad as it 
could be and it should work to generate the ROM static insts every time. 
I had also considered non-static StaticInsts and added a DynamicInst 
like layer, but I decided against them for the same reasons I think you 
don't like them. It adds a lot of complexity and changes a lot of code 
for dubious benefit performance wise, at least possibly.

My comment about micropc relative branches also applies to absolute 
branches, which is what x86 actually uses right now, when branching 
between the combinational and ROM based microops. Basically, you have to 
jump over a large swath of the micropc space to get from wherever the 
combinational microops live to the right area of the ROM, and because of 
how the microbranch is implemented, it's limited to 8 bit immediates to 
store the offset. It forms the new micropc using a register and an 
immediate or two registers so you could technically put a larger value 
in a register, but that would be pretty clumsy for every instruction 
going to the ROM. Another option would be to make the microbranch 
-always- go to the ROM, but then all the macroops with branches would 
break. I'd like to be able to fix them gradually rather than take x86 
out of commission for a month. The 8 bit limit is an effect of how the 
microcode ISA from that patent is put together so I think we should keep 
it. Even if it's painful, it should give more realistic behavior. It 
seems like I'd probably actually have to change the microbranches to be 
relative instead of absolute (I went with absolute since it was easier 
to assemble) so that you can branch around in large addresses like you 
might find in a ROM without having to have a larger immediate their 
either. Fortunately, the branches are almost all targeted at symbolic 
labels that get munged with a python function exposed to the microcode 
listing (yeah, I'll document that at some point), so that shouldn't be 
-too- hard to change. The big exception that comes to mind is CPUID 
which computes a branch target to simulate a big case statement, sort 
of, but one instruction shouldn't be too hard to deal with.

I originally wanted to use a bit in the micropc, really an offset, to 
indicate ROM vs. combinational, but there are several problems. First, 
you have to introduce this magic flag, the bit in question, to cause the 
underlying mechanism to behave differently. You might say this isn't 
anything different than a memory mapped device, but that isn't entirely 
true. In this case, using the ROM cuts some steps off of the beginning 
of the fetch-decode process which may fail or not make sense, like 
microcoding entering an interrupt handler. In that particular case, the 
entry point is in a table in memory, so the microcode needs to run to 
look up what the PC will be. The PC is undefined up to that point, so 
there can't be a fetch or decode of real life instruction memory. The 
front end can't even -try- to bring in a macroop to ignore, because 
there's no way to guarantee it won't fail and fault spuriously and short 
circuit your microcode. The bit would toggle all that on and off, and 
that seems a little too mysterious to me. I think it'd be easier and/or 
better to have a separate piece of state which you toggle explicitly 
which has all those effects and has a name which clearly indicates what 
it's doing. Also, one minor thing is that you have to constantly check 
that bit to see what you should be doing since the micropc is constantly 
changing. If you had a big event that caused the switch and set things 
up and then otherwise acted normally, you could just run assuming you 
were set up to do the right thing.

Gabe
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to