Since replying inline to your top level review isn't possible on
reviewboard, I'll respond to that part here and the rest there. Also,
thank you to you and Nate for your reviews. I took a little longer than
I'd have liked to get back to you considering how much I pestered you
guys :-), but I didn't want to rush out a sloppy reply.

Steve Reinhardt wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/255/#review405
> -----------------------------------------------------------
>
>
> So overall this is great, I love how it gets rid of the setNextPC & 
> setNextNPC stuff all over the place, and the '#if ISA_HAS_DELAY_SLOT' in the 
> bpred, etc.  Most of my concerns are just superficial issues, like making the 
> syntax a little easier.
>   

Thanks!

>   
>> If you just
>> want the address of the current or next instruction or the current micro PC,
>> you can get those through read-only accessors on either the PCState type or
>> the *Contexts. These are instAddr(), nextInstAddr(), and microPC().
>>     
>
> What's the distinction between calling instAddr() and pc() (and similarly 
> nextInstAddr() vs npc())?  It looks to me like they're currently 
> interchangeable, yet sometimes one is used and sometimes the other.
>   

instAddr, nextInstAddr, and microPC are guaranteed to be there, read
only, and what ISA independent code is allowed to use. pc, npc, etc. are
read/write accessors, there at the whim of the ISA and for its exclusive
use in managing the PC. Another distinction which came up in our recent
discussion of the PAL bit is that instAddr and nextInstAddr are supposed
to be literal memory addresses for the instruction, hence the name
change. The "pc", which is usually equivalent but not always like in the
case of Alpha, is an ISA level concept. Part of what might be confusing
is that a lot of the time they're equal to each other. Another somewhat
contrived example would be if we were to implement an ISA with a word
addressed PC like the LC2K ISA (all the people not from MI now say "the
what?"). In an effort towards consistency and towards sharing the
implementation among ISAs, the pc, npc, etc. scheme is used throughout
as applicable.

>   
>> The PCs were previously managed entirely by the CPU which had to know about 
>> PC
>> semantics, try to figure out which dimension to increment the PC in, what to
>> set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
>> with the PC type itself. Because most of the information about how to
>> increment the PC (mainly what type of instruction it refers to) is contained
>> in the instruction object, a new advancePC virtual function was added to the
>> StaticInst class. Subclasses provide an implementation that moves around the
>> right element of the PC with a minimal amount of decision making. In ISAs 
>> like
>> Alpha, the instructions always simply assign NPC to PC without having to 
>> worry
>> about micropcs, nnpcs, etc. The added cost of a virtual function call should
>> be outweighed by not having to figure out as much about what to do with the
>> PCs and mucking around with the extra elements.
>>     
>
> Does it make sense to go through an ISA-specific function, for example:
>
> inline
> TheISA::advancePC(PCState &pcState, StaticInstPtr inst)
> {
>     pcState.advance();
> OR
>     inst->advancePC(pcState);
> }
>   

As far as pcState.advance(), no. The idea is that the instruction knows
what would follow it (a microop or a new instruction) and advances the
PC in the right dimension. As far as inst->advancePC that would
functionally work, but I'm not sure how it would be useful. Wouldn't
that just obscure what's going on?

>   
>> One drawback of making the StaticInsts advance the PC is that you have to
>> actually have one to advance the PC. This would, superficially, seem to
>> require decoding an instruction before fetch could advance. This is, as far 
>> as
>> I can tell, realistic. fetch would advance through memory addresses, not PCs,
>> perhaps predicting new memory addresses using existing ones. More
>> sophisticated decisions about control flow would be made later on, after the
>> instruction was decoded, and handed back to fetch. If branching needs to
>> happen, some amount of decoding needs to happen to see that it's a branch,
>> what the target is, etc. This could get a little more complicated if that 
>> gets
>> done by the predecoder, but I'm choosing to ignore that for now.
>>     
>
> Sounds reasonable to me.
>
>   
>> Variable length instructions:
>>
>> To handle variable length instructions in x86 and ARM, the predecoder now
>> takes in the current by reference to the getExtMachInst function. It can
>>     
>
> You're missing "PC" after "current" here.
>   

Yep. Fixed in my local copy.

>   
>> ISA parser:
>>
>> To support the new API, all PC related operand types were removed from the
>> parser and replaced with a PCState type. There are two warts on this
>> implementation. First, as with all the other operand types, the PCState still
>> has to have a valid operand type even though it doesn't use it. Second, using
>> syntax like PCS.npc(target) doesn't work for two reasons, this looks like the
>> syntax for operand type overriding, and the parser can't figure out if you're
>> reading or writing. Instructions that use the PCS operand (which I've
>> consistently called it) need to first read it into a local variable,
>> manipulate it, and then write it back out.
>>     
>
> See my comments below... I don't see why this change is necessary or even 
> desirable as opposed to just making the existing PC/NPC/NNPC/UPC etc. 
> operands do the right thing, at least in the common cases.
>   

I would have really like to keep the old syntax since I completely agree
this new version is a bit clunky and round about. The basic problem is
that unlike in the previous system, there aren't accessors for reading
and writing parts of the PC since what those parts are isn't defined
external to the ISA. Because of that, an instruction that wants to
modify the PC needs to bring in a copy of the whole thing, do any
modifications locally, and then send the whole revised version back.
This could be done locally to update each operand, but then you'd have a
lot of wasteful copying of the PCState structure and calls to get/set it.

One possible solution would be to keep track of whether -any- member of
the PC was being accessed, read it in once, use it, and if any member
was written to write it out at the end. Then the individual components
could be accessed with operands. This could be a little limitting,
though, because the names, quantities, arguments, etc. of the accessors
and raw members aren't fixed. We'd have to have a syntax for sort of
dynamically building up an operand type on a per ISA basis to give
access to all the bits of the PC. This would all be possible, but I
think is outside the scope of this already very large change.

>   
>> The return address stack needed a little extra help because, in the presence
>> of branch delay slots, it has to merge together elements of the return PC and
>> the call PC. To handle that, a buildRetPC utility function was added. There
>> are basically only two versions in all the ISAs, but it didn't seem short
>> enough to put into the generic ISA directory. Also, the branch predictor code
>> in O3 and InOrder were adjusted so that they always store the PC of the 
>> actual
>> call instruction in the RAS, not the next PC. If the call instruction is a
>> microop, the next PC refers to the next microop in the same macroop which is
>> probably not desirable. The buildRetPC function advances the PC intelligently
>> to the next macroop (in an ISA specific way) so that that case works.
>>     
>
> I don't know that it matters, but just as a sanity check: wouldn't it be 
> equivalent to call buildRetPC on the call PC beforehand, then store the 
> result in the RAS?  I.e., it's not what gets stored that matters, it's using 
> the buildRetPC() logic as opposed to just using nextPC()?
>
>   

Yeah, I think that's functionally equivalent. I expect this way would
have slightly better performance because it defers the work until you
know you need it, but either way should work.

Gabe
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to