I'm a little confused... when you say microbranches are absolute, do you mean the target is an absolute offset within the sequence of uops generated by a macroinstruction?
The sort of model that comes to mind based on your description is: - Use a bit somewhere *associated with the uop* that indicates whether you're fetching from the ROM or not. Making this a bit in the PC (whether it's a high-order bit or a low-order bit) isn't critical, but it worked well for Alpha PALcode so I don't see why it's any worse of an idea in this situation. I think the key is to make it per-uop and not a global mode because otherwise as you mentioned in an earlier email getting it fixed up right on misspeculations would be a pain. Having it per-uop also lets you look at it at any stage of the pipeline and still get the right answer regardless of what else is in other stages of the pipe. Again, basically the same motivations for Alpha encoding PAL mode in the low-order bit of the PC. - Have two flavors of microbranches: a relative microbranch (for which a signed 8-bit offset probably is adequate) for branches within flows (whether they're combinational decodes or from the ROM); and an absolute microbranch-to-ROM that has a larger target address field (probably big enough to go anywhere in the ROM) and that sets the "ROM bit" for the target uop even if it wasn't previously set. Does that make sense? Steve On Tue, Sep 16, 2008 at 8:23 PM, Gabe Black <[EMAIL PROTECTED]> wrote: > I hadn't considered that the decode function could be a dominant factor > in the decode overhead. How much time do you think we spend actually > allocating a StaticInst itself? In any case, it won't be as bad as it > could be and it should work to generate the ROM static insts every time. > I had also considered non-static StaticInsts and added a DynamicInst > like layer, but I decided against them for the same reasons I think you > don't like them. It adds a lot of complexity and changes a lot of code > for dubious benefit performance wise, at least possibly. > > My comment about micropc relative branches also applies to absolute > branches, which is what x86 actually uses right now, when branching > between the combinational and ROM based microops. Basically, you have to > jump over a large swath of the micropc space to get from wherever the > combinational microops live to the right area of the ROM, and because of > how the microbranch is implemented, it's limited to 8 bit immediates to > store the offset. It forms the new micropc using a register and an > immediate or two registers so you could technically put a larger value > in a register, but that would be pretty clumsy for every instruction > going to the ROM. Another option would be to make the microbranch > -always- go to the ROM, but then all the macroops with branches would > break. I'd like to be able to fix them gradually rather than take x86 > out of commission for a month. The 8 bit limit is an effect of how the > microcode ISA from that patent is put together so I think we should keep > it. Even if it's painful, it should give more realistic behavior. It > seems like I'd probably actually have to change the microbranches to be > relative instead of absolute (I went with absolute since it was easier > to assemble) so that you can branch around in large addresses like you > might find in a ROM without having to have a larger immediate their > either. Fortunately, the branches are almost all targeted at symbolic > labels that get munged with a python function exposed to the microcode > listing (yeah, I'll document that at some point), so that shouldn't be > -too- hard to change. The big exception that comes to mind is CPUID > which computes a branch target to simulate a big case statement, sort > of, but one instruction shouldn't be too hard to deal with. > > I originally wanted to use a bit in the micropc, really an offset, to > indicate ROM vs. combinational, but there are several problems. First, > you have to introduce this magic flag, the bit in question, to cause the > underlying mechanism to behave differently. You might say this isn't > anything different than a memory mapped device, but that isn't entirely > true. In this case, using the ROM cuts some steps off of the beginning > of the fetch-decode process which may fail or not make sense, like > microcoding entering an interrupt handler. In that particular case, the > entry point is in a table in memory, so the microcode needs to run to > look up what the PC will be. The PC is undefined up to that point, so > there can't be a fetch or decode of real life instruction memory. The > front end can't even -try- to bring in a macroop to ignore, because > there's no way to guarantee it won't fail and fault spuriously and short > circuit your microcode. The bit would toggle all that on and off, and > that seems a little too mysterious to me. I think it'd be easier and/or > better to have a separate piece of state which you toggle explicitly > which has all those effects and has a name which clearly indicates what > it's doing. Also, one minor thing is that you have to constantly check > that bit to see what you should be doing since the micropc is constantly > changing. If you had a big event that caused the switch and set things > up and then otherwise acted normally, you could just run assuming you > were set up to do the right thing. > > Gabe > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev >
_______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
