I think we're talking about mostly the same thing. The ROM bit would be global, but in the same sense that the PC is global. It carries from uop to uop passively as they flow through until you hit a point where you're moving to a new macroop or into the ROM. It would be associated with a given uop which is already associated with a given PC and uPC, so if you had to go back to uop X which came from the ROM, it'd go to the right place. It'd be basically like a third, single bit PC. I'd like something conceptually similar to NPC to change it as well. Maybe there would be two bools, fromRom and nextFromRom? Those names aren't that great, but you get the idea.
Gabe Steve Reinhardt wrote: > I'm a little confused... when you say microbranches are absolute, do > you mean the target is an absolute offset within the sequence of uops > generated by a macroinstruction? > > The sort of model that comes to mind based on your description is: > > - Use a bit somewhere *associated with the uop* that indicates whether > you're fetching from the ROM or not. Making this a bit in the PC > (whether it's a high-order bit or a low-order bit) isn't critical, but > it worked well for Alpha PALcode so I don't see why it's any worse of > an idea in this situation. I think the key is to make it per-uop and > not a global mode because otherwise as you mentioned in an earlier > email getting it fixed up right on misspeculations would be a pain. > Having it per-uop also lets you look at it at any stage of the > pipeline and still get the right answer regardless of what else is in > other stages of the pipe. Again, basically the same motivations for > Alpha encoding PAL mode in the low-order bit of the PC. > > - Have two flavors of microbranches: a relative microbranch (for which > a signed 8-bit offset probably is adequate) for branches within flows > (whether they're combinational decodes or from the ROM); and an > absolute microbranch-to-ROM that has a larger target address field > (probably big enough to go anywhere in the ROM) and that sets the "ROM > bit" for the target uop even if it wasn't previously set. > > Does that make sense? > > Steve > > On Tue, Sep 16, 2008 at 8:23 PM, Gabe Black <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > I hadn't considered that the decode function could be a dominant > factor > in the decode overhead. How much time do you think we spend actually > allocating a StaticInst itself? In any case, it won't be as bad as it > could be and it should work to generate the ROM static insts every > time. > I had also considered non-static StaticInsts and added a DynamicInst > like layer, but I decided against them for the same reasons I > think you > don't like them. It adds a lot of complexity and changes a lot of code > for dubious benefit performance wise, at least possibly. > > My comment about micropc relative branches also applies to absolute > branches, which is what x86 actually uses right now, when branching > between the combinational and ROM based microops. Basically, you > have to > jump over a large swath of the micropc space to get from wherever the > combinational microops live to the right area of the ROM, and > because of > how the microbranch is implemented, it's limited to 8 bit > immediates to > store the offset. It forms the new micropc using a register and an > immediate or two registers so you could technically put a larger value > in a register, but that would be pretty clumsy for every instruction > going to the ROM. Another option would be to make the microbranch > -always- go to the ROM, but then all the macroops with branches would > break. I'd like to be able to fix them gradually rather than take x86 > out of commission for a month. The 8 bit limit is an effect of how the > microcode ISA from that patent is put together so I think we > should keep > it. Even if it's painful, it should give more realistic behavior. It > seems like I'd probably actually have to change the microbranches > to be > relative instead of absolute (I went with absolute since it was easier > to assemble) so that you can branch around in large addresses like you > might find in a ROM without having to have a larger immediate their > either. Fortunately, the branches are almost all targeted at symbolic > labels that get munged with a python function exposed to the microcode > listing (yeah, I'll document that at some point), so that shouldn't be > -too- hard to change. The big exception that comes to mind is CPUID > which computes a branch target to simulate a big case statement, sort > of, but one instruction shouldn't be too hard to deal with. > > I originally wanted to use a bit in the micropc, really an offset, to > indicate ROM vs. combinational, but there are several problems. First, > you have to introduce this magic flag, the bit in question, to > cause the > underlying mechanism to behave differently. You might say this isn't > anything different than a memory mapped device, but that isn't > entirely > true. In this case, using the ROM cuts some steps off of the beginning > of the fetch-decode process which may fail or not make sense, like > microcoding entering an interrupt handler. In that particular > case, the > entry point is in a table in memory, so the microcode needs to run to > look up what the PC will be. The PC is undefined up to that point, so > there can't be a fetch or decode of real life instruction memory. The > front end can't even -try- to bring in a macroop to ignore, because > there's no way to guarantee it won't fail and fault spuriously and > short > circuit your microcode. The bit would toggle all that on and off, and > that seems a little too mysterious to me. I think it'd be easier > and/or > better to have a separate piece of state which you toggle explicitly > which has all those effects and has a name which clearly indicates > what > it's doing. Also, one minor thing is that you have to constantly check > that bit to see what you should be doing since the micropc is > constantly > changing. If you had a big event that caused the switch and set things > up and then otherwise acted normally, you could just run assuming you > were set up to do the right thing. > > Gabe > _______________________________________________ > m5-dev mailing list > [email protected] <mailto:[email protected]> > http://m5sim.org/mailman/listinfo/m5-dev > > > ------------------------------------------------------------------------ > > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
