One other thing I just realized is a potential issue is where 
mispredicts are sent in, for instance, o3. If it's a PC mispredict, it 
should go to fetch. If it's a uPC mispredict and the PC isn't in decode, 
then it should go to fetch. If it's a uPC mispredict and the PC is in 
decode, you could argue either way but it could go to decode. If it was 
a mispredict (or interrupt or fault) that goes to someplace in the ROM, 
fetch is irrelevant since the memory isn't going to be used, and you'd 
pay a latency penalty going through fetch to get a value you're going to 
ignore anyway. That's not a concern now, but it's something to think 
about for the distant future when x86 works with a model that can 
mispredict.

Gabe

Gabe Black wrote:
> I think we're talking about mostly the same thing. The ROM bit would be 
> global, but in the same sense that the PC is global. It carries from uop 
> to uop passively as they flow through until you hit a point where you're 
> moving to a new macroop or into the ROM. It would be associated with a 
> given uop which is already associated with a given PC and uPC, so if you 
> had to go back to uop X which came from the ROM, it'd go to the right 
> place. It'd be basically like a third, single bit PC. I'd like something 
> conceptually similar to NPC to change it as well. Maybe there would be 
> two bools, fromRom and nextFromRom? Those names aren't that great, but 
> you get the idea.
>
> Gabe
>
> Steve Reinhardt wrote:
>   
>> I'm a little confused... when you say microbranches are absolute, do 
>> you mean the target is an absolute offset within the sequence of uops 
>> generated by a macroinstruction?
>>
>> The sort of model that comes to mind based on your description is:
>>
>> - Use a bit somewhere *associated with the uop* that indicates whether 
>> you're fetching from the ROM or not.  Making this a bit in the PC 
>> (whether it's a high-order bit or a low-order bit) isn't critical, but 
>> it worked well for Alpha PALcode so I don't see why it's any worse of 
>> an idea in this situation.  I think the key is to make it per-uop and 
>> not a global mode because otherwise as you mentioned in an earlier 
>> email getting it fixed up right on misspeculations would be a pain.  
>> Having it per-uop also lets you look at it at any stage of the 
>> pipeline and still get the right answer regardless of what else is in 
>> other stages of the pipe.  Again, basically the same motivations for 
>> Alpha encoding PAL mode in the low-order bit of the PC.
>>
>> - Have two flavors of microbranches: a relative microbranch (for which 
>> a signed 8-bit offset probably is adequate) for branches within flows 
>> (whether they're combinational decodes or from the ROM); and an 
>> absolute microbranch-to-ROM that has a larger target address field 
>> (probably big enough to go anywhere in the ROM) and that sets the "ROM 
>> bit" for the target uop even if it wasn't previously set.
>>
>> Does that make sense?
>>
>> Steve
>>
>> On Tue, Sep 16, 2008 at 8:23 PM, Gabe Black <[EMAIL PROTECTED] 
>> <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>     I hadn't considered that the decode function could be a dominant
>>     factor
>>     in the decode overhead. How much time do you think we spend actually
>>     allocating a StaticInst itself? In any case, it won't be as bad as it
>>     could be and it should work to generate the ROM static insts every
>>     time.
>>     I had also considered non-static StaticInsts and added a DynamicInst
>>     like layer, but I decided against them for the same reasons I
>>     think you
>>     don't like them. It adds a lot of complexity and changes a lot of code
>>     for dubious benefit performance wise, at least possibly.
>>
>>     My comment about micropc relative branches also applies to absolute
>>     branches, which is what x86 actually uses right now, when branching
>>     between the combinational and ROM based microops. Basically, you
>>     have to
>>     jump over a large swath of the micropc space to get from wherever the
>>     combinational microops live to the right area of the ROM, and
>>     because of
>>     how the microbranch is implemented, it's limited to 8 bit
>>     immediates to
>>     store the offset. It forms the new micropc using a register and an
>>     immediate or two registers so you could technically put a larger value
>>     in a register, but that would be pretty clumsy for every instruction
>>     going to the ROM. Another option would be to make the microbranch
>>     -always- go to the ROM, but then all the macroops with branches would
>>     break. I'd like to be able to fix them gradually rather than take x86
>>     out of commission for a month. The 8 bit limit is an effect of how the
>>     microcode ISA from that patent is put together so I think we
>>     should keep
>>     it. Even if it's painful, it should give more realistic behavior. It
>>     seems like I'd probably actually have to change the microbranches
>>     to be
>>     relative instead of absolute (I went with absolute since it was easier
>>     to assemble) so that you can branch around in large addresses like you
>>     might find in a ROM without having to have a larger immediate their
>>     either. Fortunately, the branches are almost all targeted at symbolic
>>     labels that get munged with a python function exposed to the microcode
>>     listing (yeah, I'll document that at some point), so that shouldn't be
>>     -too- hard to change. The big exception that comes to mind is CPUID
>>     which computes a branch target to simulate a big case statement, sort
>>     of, but one instruction shouldn't be too hard to deal with.
>>
>>     I originally wanted to use a bit in the micropc, really an offset, to
>>     indicate ROM vs. combinational, but there are several problems. First,
>>     you have to introduce this magic flag, the bit in question, to
>>     cause the
>>     underlying mechanism to behave differently. You might say this isn't
>>     anything different than a memory mapped device, but that isn't
>>     entirely
>>     true. In this case, using the ROM cuts some steps off of the beginning
>>     of the fetch-decode process which may fail or not make sense, like
>>     microcoding entering an interrupt handler. In that particular
>>     case, the
>>     entry point is in a table in memory, so the microcode needs to run to
>>     look up what the PC will be. The PC is undefined up to that point, so
>>     there can't be a fetch or decode of real life instruction memory. The
>>     front end can't even -try- to bring in a macroop to ignore, because
>>     there's no way to guarantee it won't fail and fault spuriously and
>>     short
>>     circuit your microcode. The bit would toggle all that on and off, and
>>     that seems a little too mysterious to me. I think it'd be easier
>>     and/or
>>     better to have a separate piece of state which you toggle explicitly
>>     which has all those effects and has a name which clearly indicates
>>     what
>>     it's doing. Also, one minor thing is that you have to constantly check
>>     that bit to see what you should be doing since the micropc is
>>     constantly
>>     changing. If you had a big event that caused the switch and set things
>>     up and then otherwise acted normally, you could just run assuming you
>>     were set up to do the right thing.
>>
>>     Gabe
>>     _______________________________________________
>>     m5-dev mailing list
>>     [email protected] <mailto:[email protected]>
>>     http://m5sim.org/mailman/listinfo/m5-dev
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> m5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/m5-dev
>>   
>>     
>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>   

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to