It looks like I confused reviewboard. My replies are there, though, if you follow the link.
Gabe On 10/24/10 00:30, Gabe Black wrote: > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://reviews.m5sim.org/r/255/#review413 > ----------------------------------------------------------- > > > - Gabe > > > On 2010-10-14 00:01:15, Gabe Black wrote: >> ----------------------------------------------------------- >> This is an automatically generated e-mail. To reply, visit: >> http://reviews.m5sim.org/r/255/ >> ----------------------------------------------------------- >> >> (Updated 2010-10-14 00:01:15) >> >> >> Review request for Default. >> >> >> Summary >> ------- >> >> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. >> >> >> This change is a low level and pervasive reorganization of how PCs are >> managed >> in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry >> about, >> the PC and the NPC, and the lsb of the PC signaled whether or not you were in >> PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next >> micropc, x86 and ARM introduced variable length instruction sets, and ARM >> started to keep track of mode bits in the PC. Each CPU model handled PCs in >> its own custom way that needed to be updated individually to handle the new >> dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, >> the complexity could be hidden in the ISA at the ISA implementation's >> expense. >> Areas like the branch predictor hadn't been updated to handle branch delay >> slots or micropcs, and it turns out that had introduced a significant (10s of >> percent) performance bug in SPARC and to a lesser extend MIPS. Rather than >> perpetuate the problem by reworking O3 again to handle the PC features needed >> by x86, this change was introduced to rework PC handling in a more modular, >> transparent, and hopefully efficient way. >> >> >> PC type: >> >> Rather than having the superset of all possible elements of PC state declared >> in each of the CPU models, each ISA defines its own PCState type which has >> exactly the elements it needs. A cross product of canned PCState classes are >> defined in the new "generic" ISA directory for ISAs with/without delay slots >> and microcode. These are either typedef-ed or subclassed by each ISA. To read >> or write this structure through a *Context, you use the new pcState() >> accessor >> which reads or writes depending on whether it has an argument. If you just >> want the address of the current or next instruction or the current micro PC, >> you can get those through read-only accessors on either the PCState type or >> the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the >> move away from readPC. That name is ambiguous since it's not clear whether or >> not it should be the actual address to fetch from, or if it should have extra >> bits in it like the PAL mode bit. Each class is free to define its own >> functions to get at whatever values it needs however it needs to to be used >> in >> ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the >> PC and into a separate field like ARM. >> >> These types can be reset to a particular pc (where npc = pc + >> sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as >> appropriate), printed, serialized, and compared. There is a branching() >> function which encapsulates code in the CPU models that checked if an >> instruction branched or not. Exactly what that means in the context of branch >> delay slots which can skip an instruction when not taken is ambiguous, and >> ideally this function and its uses can be eliminated. PCStates also generally >> know how to advance themselves in various ways depending on if they point at >> an instruction, a microop, or the last microop of a macroop. More on that >> later. >> >> Ideally, accessing all the PCs at once when setting them will improve >> performance of M5 even though more data needs to be moved around. This is >> because often all the PCs need to be manipulated together, and by getting >> them >> all at once you avoid multiple function calls. Also, the PCs of a particular >> thread will have spatial locality in the cache. Previously they were grouped >> by element in arrays which spread out accesses. >> >> >> Advancing the PC: >> >> The PCs were previously managed entirely by the CPU which had to know about >> PC >> semantics, try to figure out which dimension to increment the PC in, what to >> set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction >> with the PC type itself. Because most of the information about how to >> increment the PC (mainly what type of instruction it refers to) is contained >> in the instruction object, a new advancePC virtual function was added to the >> StaticInst class. Subclasses provide an implementation that moves around the >> right element of the PC with a minimal amount of decision making. In ISAs >> like >> Alpha, the instructions always simply assign NPC to PC without having to >> worry >> about micropcs, nnpcs, etc. The added cost of a virtual function call should >> be outweighed by not having to figure out as much about what to do with the >> PCs and mucking around with the extra elements. >> >> One drawback of making the StaticInsts advance the PC is that you have to >> actually have one to advance the PC. This would, superficially, seem to >> require decoding an instruction before fetch could advance. This is, as far >> as >> I can tell, realistic. fetch would advance through memory addresses, not PCs, >> perhaps predicting new memory addresses using existing ones. More >> sophisticated decisions about control flow would be made later on, after the >> instruction was decoded, and handed back to fetch. If branching needs to >> happen, some amount of decoding needs to happen to see that it's a branch, >> what the target is, etc. This could get a little more complicated if that >> gets >> done by the predecoder, but I'm choosing to ignore that for now. >> >> >> Variable length instructions: >> >> To handle variable length instructions in x86 and ARM, the predecoder now >> takes in the current by reference to the getExtMachInst function. It can >> modify the PC however it needs to (by setting NPC to be the PC + instruction >> length, for instance). This could be improved since the CPU doesn't know if >> the PC was modified and always has to write it back. >> >> >> ISA parser: >> >> To support the new API, all PC related operand types were removed from the >> parser and replaced with a PCState type. There are two warts on this >> implementation. First, as with all the other operand types, the PCState still >> has to have a valid operand type even though it doesn't use it. Second, using >> syntax like PCS.npc(target) doesn't work for two reasons, this looks like the >> syntax for operand type overriding, and the parser can't figure out if you're >> reading or writing. Instructions that use the PCS operand (which I've >> consistently called it) need to first read it into a local variable, >> manipulate it, and then write it back out. >> >> >> Return address stack: >> >> The return address stack needed a little extra help because, in the presence >> of branch delay slots, it has to merge together elements of the return PC and >> the call PC. To handle that, a buildRetPC utility function was added. There >> are basically only two versions in all the ISAs, but it didn't seem short >> enough to put into the generic ISA directory. Also, the branch predictor code >> in O3 and InOrder were adjusted so that they always store the PC of the >> actual >> call instruction in the RAS, not the next PC. If the call instruction is a >> microop, the next PC refers to the next microop in the same macroop which is >> probably not desirable. The buildRetPC function advances the PC intelligently >> to the next macroop (in an ISA specific way) so that that case works. >> >> >> Change in stats: >> >> There were no change in stats except in MIPS and SPARC in the O3 model. MIPS >> runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which >> could >> likely be improved further by setting call/return instruction flags and >> taking >> advantage of the RAS. >> >> >> TODO: >> >> Add != operators to the PCState classes, defined trivially to be !(a==b). >> Smooth out places where PCs are split apart, passed around, and put back >> together later. I think this might happen in SPARC's fault code. Add ISA >> specific constructors that allow setting PC elements without calling a bunch >> of accessors. Try to eliminate the need for the branching() function. Factor >> out Alpha's PAL mode pc bit into a separate flag field, and eliminate places >> where it's blindly masked out or tested in the PC. >> >> >> Diffs >> ----- >> >> src/arch/alpha/ev5.cc addb847910d2 >> src/arch/alpha/faults.cc addb847910d2 >> src/arch/alpha/interrupts.hh addb847910d2 >> src/arch/alpha/isa/branch.isa addb847910d2 >> src/arch/alpha/isa/decoder.isa addb847910d2 >> src/arch/alpha/isa/main.isa addb847910d2 >> src/arch/alpha/predecoder.hh addb847910d2 >> src/arch/alpha/process.cc addb847910d2 >> src/arch/alpha/remote_gdb.cc addb847910d2 >> src/arch/alpha/stacktrace.cc addb847910d2 >> src/arch/alpha/tlb.cc addb847910d2 >> src/arch/alpha/types.hh addb847910d2 >> src/arch/alpha/utility.hh addb847910d2 >> src/arch/alpha/utility.cc addb847910d2 >> src/arch/arm/faults.cc addb847910d2 >> src/arch/arm/insts/macromem.hh addb847910d2 >> src/arch/arm/insts/mem.hh addb847910d2 >> src/arch/arm/insts/pred_inst.hh addb847910d2 >> src/arch/arm/insts/static_inst.hh addb847910d2 >> src/arch/arm/insts/vfp.hh addb847910d2 >> src/arch/arm/isa.cc addb847910d2 >> src/arch/arm/isa/formats/breakpoint.isa addb847910d2 >> src/arch/arm/isa/insts/branch.isa addb847910d2 >> src/arch/arm/isa/insts/data.isa addb847910d2 >> src/arch/arm/isa/insts/ldr.isa addb847910d2 >> src/arch/arm/isa/insts/macromem.isa addb847910d2 >> src/arch/arm/isa/insts/misc.isa addb847910d2 >> src/arch/arm/isa/operands.isa addb847910d2 >> src/arch/arm/isa_traits.hh addb847910d2 >> src/arch/arm/linux/system.cc addb847910d2 >> src/arch/arm/nativetrace.cc addb847910d2 >> src/arch/arm/predecoder.hh addb847910d2 >> src/arch/arm/predecoder.cc addb847910d2 >> src/arch/arm/process.cc addb847910d2 >> src/arch/arm/system.hh addb847910d2 >> src/arch/arm/table_walker.cc addb847910d2 >> src/arch/arm/tlb.cc addb847910d2 >> src/arch/arm/types.hh addb847910d2 >> src/arch/arm/utility.hh addb847910d2 >> src/arch/arm/utility.cc addb847910d2 >> src/arch/generic/types.hh PRE-CREATION >> src/arch/isa_parser.py addb847910d2 >> src/arch/mips/isa/base.isa addb847910d2 >> src/arch/mips/isa/decoder.isa addb847910d2 >> src/arch/mips/isa/formats/branch.isa addb847910d2 >> src/arch/mips/isa/includes.isa addb847910d2 >> src/arch/mips/isa/operands.isa addb847910d2 >> src/arch/mips/mt.hh addb847910d2 >> src/arch/mips/predecoder.hh addb847910d2 >> src/arch/mips/process.cc addb847910d2 >> src/arch/mips/types.hh addb847910d2 >> src/arch/mips/utility.hh addb847910d2 >> src/arch/mips/utility.cc addb847910d2 >> src/arch/power/insts/branch.hh addb847910d2 >> src/arch/power/insts/branch.cc addb847910d2 >> src/arch/power/insts/static_inst.hh addb847910d2 >> src/arch/power/isa/decoder.isa addb847910d2 >> src/arch/power/isa/formats/branch.isa addb847910d2 >> src/arch/power/isa/formats/unknown.isa addb847910d2 >> src/arch/power/isa/operands.isa addb847910d2 >> src/arch/power/predecoder.hh addb847910d2 >> src/arch/power/process.cc addb847910d2 >> src/arch/power/types.hh addb847910d2 >> src/arch/power/utility.hh addb847910d2 >> src/arch/power/utility.cc addb847910d2 >> src/arch/sparc/faults.cc addb847910d2 >> src/arch/sparc/isa/base.isa addb847910d2 >> src/arch/sparc/isa/decoder.isa addb847910d2 >> src/arch/sparc/isa/formats/branch.isa addb847910d2 >> src/arch/sparc/isa/formats/micro.isa addb847910d2 >> src/arch/sparc/isa/operands.isa addb847910d2 >> src/arch/sparc/nativetrace.cc addb847910d2 >> src/arch/sparc/predecoder.hh addb847910d2 >> src/arch/sparc/process.cc addb847910d2 >> src/arch/sparc/remote_gdb.cc addb847910d2 >> src/arch/sparc/types.hh addb847910d2 >> src/arch/sparc/utility.hh addb847910d2 >> src/arch/sparc/utility.cc addb847910d2 >> src/arch/x86/faults.cc addb847910d2 >> src/arch/x86/insts/macroop.hh addb847910d2 >> src/arch/x86/insts/microop.hh addb847910d2 >> src/arch/x86/insts/static_inst.hh addb847910d2 >> src/arch/x86/isa/decoder/two_byte_opcodes.isa addb847910d2 >> src/arch/x86/isa/formats/unknown.isa addb847910d2 >> src/arch/x86/isa/microops/regop.isa addb847910d2 >> src/arch/x86/isa/microops/seqop.isa addb847910d2 >> src/arch/x86/isa/operands.isa addb847910d2 >> src/arch/x86/nativetrace.cc addb847910d2 >> src/arch/x86/predecoder.hh addb847910d2 >> src/arch/x86/process.cc addb847910d2 >> src/arch/x86/system.cc addb847910d2 >> src/arch/x86/tlb.cc addb847910d2 >> src/arch/x86/types.hh addb847910d2 >> src/arch/x86/utility.hh addb847910d2 >> src/arch/x86/utility.cc addb847910d2 >> src/base/remote_gdb.cc addb847910d2 >> src/base/types.hh addb847910d2 >> src/cpu/base_dyn_inst.hh addb847910d2 >> src/cpu/base_dyn_inst_impl.hh addb847910d2 >> src/cpu/checker/cpu.hh addb847910d2 >> src/cpu/exetrace.hh addb847910d2 >> src/cpu/exetrace.cc addb847910d2 >> src/cpu/inorder/comm.hh addb847910d2 >> src/cpu/inorder/cpu.hh addb847910d2 >> src/cpu/inorder/cpu.cc addb847910d2 >> src/cpu/inorder/first_stage.cc addb847910d2 >> src/cpu/inorder/inorder_dyn_inst.hh addb847910d2 >> src/cpu/inorder/inorder_dyn_inst.cc addb847910d2 >> src/cpu/inorder/inorder_trace.hh addb847910d2 >> src/cpu/inorder/inorder_trace.cc addb847910d2 >> src/cpu/inorder/pipeline_stage.hh addb847910d2 >> src/cpu/inorder/pipeline_stage.cc addb847910d2 >> src/cpu/inorder/resource_pool.9stage.cc addb847910d2 >> src/cpu/inorder/resources/bpred_unit.hh addb847910d2 >> src/cpu/inorder/resources/bpred_unit.cc addb847910d2 >> src/cpu/inorder/resources/branch_predictor.cc addb847910d2 >> src/cpu/inorder/resources/cache_unit.cc addb847910d2 >> src/cpu/inorder/resources/execution_unit.cc addb847910d2 >> src/cpu/inorder/resources/fetch_seq_unit.hh addb847910d2 >> src/cpu/inorder/resources/fetch_seq_unit.cc addb847910d2 >> src/cpu/inorder/resources/tlb_unit.hh addb847910d2 >> src/cpu/inorder/thread_context.hh addb847910d2 >> src/cpu/inorder/thread_context.cc addb847910d2 >> src/cpu/inorder/thread_state.hh addb847910d2 >> src/cpu/inteltrace.hh addb847910d2 >> src/cpu/inteltrace.cc addb847910d2 >> src/cpu/legiontrace.hh addb847910d2 >> src/cpu/legiontrace.cc addb847910d2 >> src/cpu/nativetrace.hh addb847910d2 >> src/cpu/o3/bpred_unit.hh addb847910d2 >> src/cpu/o3/bpred_unit_impl.hh addb847910d2 >> src/cpu/o3/comm.hh addb847910d2 >> src/cpu/o3/commit.hh addb847910d2 >> src/cpu/o3/commit_impl.hh addb847910d2 >> src/cpu/o3/cpu.hh addb847910d2 >> src/cpu/o3/cpu.cc addb847910d2 >> src/cpu/o3/decode_impl.hh addb847910d2 >> src/cpu/o3/dep_graph.hh addb847910d2 >> src/cpu/o3/dyn_inst.hh addb847910d2 >> src/cpu/o3/dyn_inst_impl.hh addb847910d2 >> src/cpu/o3/fetch.hh addb847910d2 >> src/cpu/o3/fetch_impl.hh addb847910d2 >> src/cpu/o3/iew_impl.hh addb847910d2 >> src/cpu/o3/inst_queue_impl.hh addb847910d2 >> src/cpu/o3/lsq_unit.hh addb847910d2 >> src/cpu/o3/lsq_unit_impl.hh addb847910d2 >> src/cpu/o3/mem_dep_unit_impl.hh addb847910d2 >> src/cpu/o3/rename_impl.hh addb847910d2 >> src/cpu/o3/rob_impl.hh addb847910d2 >> src/cpu/o3/thread_context.hh addb847910d2 >> src/cpu/o3/thread_context_impl.hh addb847910d2 >> src/cpu/pc_event.cc addb847910d2 >> src/cpu/pred/btb.hh addb847910d2 >> src/cpu/pred/btb.cc addb847910d2 >> src/cpu/pred/ras.hh addb847910d2 >> src/cpu/pred/ras.cc addb847910d2 >> src/cpu/simple/atomic.cc addb847910d2 >> src/cpu/simple/base.hh addb847910d2 >> src/cpu/simple/base.cc addb847910d2 >> src/cpu/simple/timing.cc addb847910d2 >> src/cpu/simple_thread.hh addb847910d2 >> src/cpu/simple_thread.cc addb847910d2 >> src/cpu/static_inst.hh addb847910d2 >> src/cpu/static_inst.cc addb847910d2 >> src/cpu/thread_context.hh addb847910d2 >> src/cpu/thread_context.cc addb847910d2 >> src/kern/system_events.cc addb847910d2 >> src/kern/tru64/tru64.hh addb847910d2 >> src/sim/arguments.hh addb847910d2 >> src/sim/faults.cc addb847910d2 >> src/sim/insttracer.hh addb847910d2 >> src/sim/syscall_emul.hh addb847910d2 >> src/sim/syscall_emul.cc addb847910d2 >> >> Diff: http://reviews.m5sim.org/r/255/diff >> >> >> Testing >> ------- >> >> Quick regressions >> At various times long regressions >> Running all regressions now >> Trace output while debugging >> >> >> Thanks, >> >> Gabe >> >> > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
