This email has a lot of good tidbits in it that I would like to see as doxygen comments in the ptstate base class.
Thanks, Ali On Oct 24, 2010, at 2:31 AM, Gabe Black wrote: > Since replying inline to your top level review isn't possible on > reviewboard, I'll respond to that part here and the rest there. Also, > thank you to you and Nate for your reviews. I took a little longer than > I'd have liked to get back to you considering how much I pestered you > guys :-), but I didn't want to rush out a sloppy reply. > > Steve Reinhardt wrote: >> ----------------------------------------------------------- >> This is an automatically generated e-mail. To reply, visit: >> http://reviews.m5sim.org/r/255/#review405 >> ----------------------------------------------------------- >> >> >> So overall this is great, I love how it gets rid of the setNextPC & >> setNextNPC stuff all over the place, and the '#if ISA_HAS_DELAY_SLOT' in the >> bpred, etc. Most of my concerns are just superficial issues, like making >> the syntax a little easier. >> > > Thanks! > >> >>> If you just >>> want the address of the current or next instruction or the current micro PC, >>> you can get those through read-only accessors on either the PCState type or >>> the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). >>> >> >> What's the distinction between calling instAddr() and pc() (and similarly >> nextInstAddr() vs npc())? It looks to me like they're currently >> interchangeable, yet sometimes one is used and sometimes the other. >> > > instAddr, nextInstAddr, and microPC are guaranteed to be there, read > only, and what ISA independent code is allowed to use. pc, npc, etc. are > read/write accessors, there at the whim of the ISA and for its exclusive > use in managing the PC. Another distinction which came up in our recent > discussion of the PAL bit is that instAddr and nextInstAddr are supposed > to be literal memory addresses for the instruction, hence the name > change. The "pc", which is usually equivalent but not always like in the > case of Alpha, is an ISA level concept. Part of what might be confusing > is that a lot of the time they're equal to each other. Another somewhat > contrived example would be if we were to implement an ISA with a word > addressed PC like the LC2K ISA (all the people not from MI now say "the > what?"). In an effort towards consistency and towards sharing the > implementation among ISAs, the pc, npc, etc. scheme is used throughout > as applicable. > >> >>> The PCs were previously managed entirely by the CPU which had to know about >>> PC >>> semantics, try to figure out which dimension to increment the PC in, what to >>> set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction >>> with the PC type itself. Because most of the information about how to >>> increment the PC (mainly what type of instruction it refers to) is contained >>> in the instruction object, a new advancePC virtual function was added to the >>> StaticInst class. Subclasses provide an implementation that moves around the >>> right element of the PC with a minimal amount of decision making. In ISAs >>> like >>> Alpha, the instructions always simply assign NPC to PC without having to >>> worry >>> about micropcs, nnpcs, etc. The added cost of a virtual function call should >>> be outweighed by not having to figure out as much about what to do with the >>> PCs and mucking around with the extra elements. >>> >> >> Does it make sense to go through an ISA-specific function, for example: >> >> inline >> TheISA::advancePC(PCState &pcState, StaticInstPtr inst) >> { >> pcState.advance(); >> OR >> inst->advancePC(pcState); >> } >> > > As far as pcState.advance(), no. The idea is that the instruction knows > what would follow it (a microop or a new instruction) and advances the > PC in the right dimension. As far as inst->advancePC that would > functionally work, but I'm not sure how it would be useful. Wouldn't > that just obscure what's going on? > >> >>> One drawback of making the StaticInsts advance the PC is that you have to >>> actually have one to advance the PC. This would, superficially, seem to >>> require decoding an instruction before fetch could advance. This is, as far >>> as >>> I can tell, realistic. fetch would advance through memory addresses, not >>> PCs, >>> perhaps predicting new memory addresses using existing ones. More >>> sophisticated decisions about control flow would be made later on, after the >>> instruction was decoded, and handed back to fetch. If branching needs to >>> happen, some amount of decoding needs to happen to see that it's a branch, >>> what the target is, etc. This could get a little more complicated if that >>> gets >>> done by the predecoder, but I'm choosing to ignore that for now. >>> >> >> Sounds reasonable to me. >> >> >>> Variable length instructions: >>> >>> To handle variable length instructions in x86 and ARM, the predecoder now >>> takes in the current by reference to the getExtMachInst function. It can >>> >> >> You're missing "PC" after "current" here. >> > > Yep. Fixed in my local copy. > >> >>> ISA parser: >>> >>> To support the new API, all PC related operand types were removed from the >>> parser and replaced with a PCState type. There are two warts on this >>> implementation. First, as with all the other operand types, the PCState >>> still >>> has to have a valid operand type even though it doesn't use it. Second, >>> using >>> syntax like PCS.npc(target) doesn't work for two reasons, this looks like >>> the >>> syntax for operand type overriding, and the parser can't figure out if >>> you're >>> reading or writing. Instructions that use the PCS operand (which I've >>> consistently called it) need to first read it into a local variable, >>> manipulate it, and then write it back out. >>> >> >> See my comments below... I don't see why this change is necessary or even >> desirable as opposed to just making the existing PC/NPC/NNPC/UPC etc. >> operands do the right thing, at least in the common cases. >> > > I would have really like to keep the old syntax since I completely agree > this new version is a bit clunky and round about. The basic problem is > that unlike in the previous system, there aren't accessors for reading > and writing parts of the PC since what those parts are isn't defined > external to the ISA. Because of that, an instruction that wants to > modify the PC needs to bring in a copy of the whole thing, do any > modifications locally, and then send the whole revised version back. > This could be done locally to update each operand, but then you'd have a > lot of wasteful copying of the PCState structure and calls to get/set it. > > One possible solution would be to keep track of whether -any- member of > the PC was being accessed, read it in once, use it, and if any member > was written to write it out at the end. Then the individual components > could be accessed with operands. This could be a little limitting, > though, because the names, quantities, arguments, etc. of the accessors > and raw members aren't fixed. We'd have to have a syntax for sort of > dynamically building up an operand type on a per ISA basis to give > access to all the bits of the PC. This would all be possible, but I > think is outside the scope of this already very large change. > >> >>> The return address stack needed a little extra help because, in the presence >>> of branch delay slots, it has to merge together elements of the return PC >>> and >>> the call PC. To handle that, a buildRetPC utility function was added. There >>> are basically only two versions in all the ISAs, but it didn't seem short >>> enough to put into the generic ISA directory. Also, the branch predictor >>> code >>> in O3 and InOrder were adjusted so that they always store the PC of the >>> actual >>> call instruction in the RAS, not the next PC. If the call instruction is a >>> microop, the next PC refers to the next microop in the same macroop which is >>> probably not desirable. The buildRetPC function advances the PC >>> intelligently >>> to the next macroop (in an ISA specific way) so that that case works. >>> >> >> I don't know that it matters, but just as a sanity check: wouldn't it be >> equivalent to call buildRetPC on the call PC beforehand, then store the >> result in the RAS? I.e., it's not what gets stored that matters, it's using >> the buildRetPC() logic as opposed to just using nextPC()? >> >> > > Yeah, I think that's functionally equivalent. I expect this way would > have slightly better performance because it defers the work until you > know you need it, but either way should work. > > Gabe > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
