Steve Reinhardt wrote: > Sorry for the delay in responding... every time I came around to it > the overhead of reading through the accumulated thread seemed more and > more imposing. Even after I've read through it all I'm sure I don't > quite understand all the issues in detail, but I'll comment anyway. > > I agree that trying to store the ucode ROM in some binary format and > then decoding it on the fly doesn't make much sense. If we store the > ucode directly as a set of StaticInsts (or other objects that generate > specialized StaticInsts) then we've eliminated the > machine-code-to-StaticInst translation anyway, so not being able to > cache the resulting StaticInsts may not be a big deal. > > Perhaps we can extend the StaticInst class (or create a subclass) > which has a specialize() virtual method that reads state out of the > thread context to (potentially) generate a new StaticInst? On the > other hand, in general the idea of StaticInst objects that aren't > truly static bothers me... it would be nice if the StaticInst could > truly be static, and then the specialized information could be read > out of the thread context on demand, with something like a DynamicInst > used to cache the specialized version as necessary (basically treating > the specialization the way we handle register renaming.) The bad > thing is that we'd then need DynamicInst objects even for SimpleCPU, > which is a big change. I'm not really pushing anything specific, just > voicing my concerns in case it triggers some ideas. > > Can we just use a bit in the micropc to indicate ROM mode vs > combinational mode? I don't get Gabe's comment about micropc-relative > branches... in what case would you have branches between the two modes > other than the single absolute branch you need to go from > combinational to ROM mode? > > Steve > > ------------------------------------------------------------------------ > > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > I hadn't considered that the decode function could be a dominant factor in the decode overhead. How much time do you think we spend actually allocating a StaticInst itself? In any case, it won't be as bad as it could be and it should work to generate the ROM static insts every time. I had also considered non-static StaticInsts and added a DynamicInst like layer, but I decided against them for the same reasons I think you don't like them. It adds a lot of complexity and changes a lot of code for dubious benefit performance wise, at least possibly.
My comment about micropc relative branches also applies to absolute branches, which is what x86 actually uses right now, when branching between the combinational and ROM based microops. Basically, you have to jump over a large swath of the micropc space to get from wherever the combinational microops live to the right area of the ROM, and because of how the microbranch is implemented, it's limited to 8 bit immediates to store the offset. It forms the new micropc using a register and an immediate or two registers so you could technically put a larger value in a register, but that would be pretty clumsy for every instruction going to the ROM. Another option would be to make the microbranch -always- go to the ROM, but then all the macroops with branches would break. I'd like to be able to fix them gradually rather than take x86 out of commission for a month. The 8 bit limit is an effect of how the microcode ISA from that patent is put together so I think we should keep it. Even if it's painful, it should give more realistic behavior. It seems like I'd probably actually have to change the microbranches to be relative instead of absolute (I went with absolute since it was easier to assemble) so that you can branch around in large addresses like you might find in a ROM without having to have a larger immediate their either. Fortunately, the branches are almost all targeted at symbolic labels that get munged with a python function exposed to the microcode listing (yeah, I'll document that at some point), so that shouldn't be -too- hard to change. The big exception that comes to mind is CPUID which computes a branch target to simulate a big case statement, sort of, but one instruction shouldn't be too hard to deal with. I originally wanted to use a bit in the micropc, really an offset, to indicate ROM vs. combinational, but there are several problems. First, you have to introduce this magic flag, the bit in question, to cause the underlying mechanism to behave differently. You might say this isn't anything different than a memory mapped device, but that isn't entirely true. In this case, using the ROM cuts some steps off of the beginning of the fetch-decode process which may fail or not make sense, like microcoding entering an interrupt handler. In that particular case, the entry point is in a table in memory, so the microcode needs to run to look up what the PC will be. The PC is undefined up to that point, so there can't be a fetch or decode of real life instruction memory. The front end can't even -try- to bring in a macroop to ignore, because there's no way to guarantee it won't fail and fault spuriously and short circuit your microcode. The bit would toggle all that on and off, and that seems a little too mysterious to me. I think it'd be easier and/or better to have a separate piece of state which you toggle explicitly which has all those effects and has a name which clearly indicates what it's doing. Also, one minor thing is that you have to constantly check that bit to see what you should be doing since the micropc is constantly changing. If you had a big event that caused the switch and set things up and then otherwise acted normally, you could just run assuming you were set up to do the right thing. Gabe _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
