----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/255/#review426 -----------------------------------------------------------
src/arch/arm/faults.cc <http://reviews.m5sim.org/r/255/#comment634> ahh sure... only other thing is that this does't compile.... curPC doesn't exist, but curPc does. - Ali On 2010-10-27 21:48:56, Gabe Black wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://reviews.m5sim.org/r/255/ > ----------------------------------------------------------- > > (Updated 2010-10-27 21:48:56) > > > Review request for Default. > > > Summary > ------- > > ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors. > > > This change is a low level and pervasive reorganization of how PCs are managed > in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, > the PC and the NPC, and the lsb of the PC signaled whether or not you were in > PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next > micropc, x86 and ARM introduced variable length instruction sets, and ARM > started to keep track of mode bits in the PC. Each CPU model handled PCs in > its own custom way that needed to be updated individually to handle the new > dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, > the complexity could be hidden in the ISA at the ISA implementation's expense. > Areas like the branch predictor hadn't been updated to handle branch delay > slots or micropcs, and it turns out that had introduced a significant (10s of > percent) performance bug in SPARC and to a lesser extend MIPS. Rather than > perpetuate the problem by reworking O3 again to handle the PC features needed > by x86, this change was introduced to rework PC handling in a more modular, > transparent, and hopefully efficient way. > > > PC type: > > Rather than having the superset of all possible elements of PC state declared > in each of the CPU models, each ISA defines its own PCState type which has > exactly the elements it needs. A cross product of canned PCState classes are > defined in the new "generic" ISA directory for ISAs with/without delay slots > and microcode. These are either typedef-ed or subclassed by each ISA. To read > or write this structure through a *Context, you use the new pcState() accessor > which reads or writes depending on whether it has an argument. If you just > want the address of the current or next instruction or the current micro PC, > you can get those through read-only accessors on either the PCState type or > the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the > move away from readPC. That name is ambiguous since it's not clear whether or > not it should be the actual address to fetch from, or if it should have extra > bits in it like the PAL mode bit. Each class is free to define its own > functions to get at whatever values it needs however it needs to to be used in > ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the > PC and into a separate field like ARM. > > These types can be reset to a particular pc (where npc = pc + > sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as > appropriate), printed, serialized, and compared. There is a branching() > function which encapsulates code in the CPU models that checked if an > instruction branched or not. Exactly what that means in the context of branch > delay slots which can skip an instruction when not taken is ambiguous, and > ideally this function and its uses can be eliminated. PCStates also generally > know how to advance themselves in various ways depending on if they point at > an instruction, a microop, or the last microop of a macroop. More on that > later. > > Ideally, accessing all the PCs at once when setting them will improve > performance of M5 even though more data needs to be moved around. This is > because often all the PCs need to be manipulated together, and by getting them > all at once you avoid multiple function calls. Also, the PCs of a particular > thread will have spatial locality in the cache. Previously they were grouped > by element in arrays which spread out accesses. > > > Advancing the PC: > > The PCs were previously managed entirely by the CPU which had to know about PC > semantics, try to figure out which dimension to increment the PC in, what to > set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction > with the PC type itself. Because most of the information about how to > increment the PC (mainly what type of instruction it refers to) is contained > in the instruction object, a new advancePC virtual function was added to the > StaticInst class. Subclasses provide an implementation that moves around the > right element of the PC with a minimal amount of decision making. In ISAs like > Alpha, the instructions always simply assign NPC to PC without having to worry > about micropcs, nnpcs, etc. The added cost of a virtual function call should > be outweighed by not having to figure out as much about what to do with the > PCs and mucking around with the extra elements. > > One drawback of making the StaticInsts advance the PC is that you have to > actually have one to advance the PC. This would, superficially, seem to > require decoding an instruction before fetch could advance. This is, as far as > I can tell, realistic. fetch would advance through memory addresses, not PCs, > perhaps predicting new memory addresses using existing ones. More > sophisticated decisions about control flow would be made later on, after the > instruction was decoded, and handed back to fetch. If branching needs to > happen, some amount of decoding needs to happen to see that it's a branch, > what the target is, etc. This could get a little more complicated if that gets > done by the predecoder, but I'm choosing to ignore that for now. > > > Variable length instructions: > > To handle variable length instructions in x86 and ARM, the predecoder now > takes in the current PC by reference to the getExtMachInst function. It can > modify the PC however it needs to (by setting NPC to be the PC + instruction > length, for instance). This could be improved since the CPU doesn't know if > the PC was modified and always has to write it back. > > > ISA parser: > > To support the new API, all PC related operand types were removed from the > parser and replaced with a PCState type. There are two warts on this > implementation. First, as with all the other operand types, the PCState still > has to have a valid operand type even though it doesn't use it. Second, using > syntax like PCS.npc(target) doesn't work for two reasons, this looks like the > syntax for operand type overriding, and the parser can't figure out if you're > reading or writing. Instructions that use the PCS operand (which I've > consistently called it) need to first read it into a local variable, > manipulate it, and then write it back out. > > > Return address stack: > > The return address stack needed a little extra help because, in the presence > of branch delay slots, it has to merge together elements of the return PC and > the call PC. To handle that, a buildRetPC utility function was added. There > are basically only two versions in all the ISAs, but it didn't seem short > enough to put into the generic ISA directory. Also, the branch predictor code > in O3 and InOrder were adjusted so that they always store the PC of the actual > call instruction in the RAS, not the next PC. If the call instruction is a > microop, the next PC refers to the next microop in the same macroop which is > probably not desirable. The buildRetPC function advances the PC intelligently > to the next macroop (in an ISA specific way) so that that case works. > > > Change in stats: > > There were no change in stats except in MIPS and SPARC in the O3 model. MIPS > runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could > likely be improved further by setting call/return instruction flags and taking > advantage of the RAS. > > > TODO: > > Add != operators to the PCState classes, defined trivially to be !(a==b). > Smooth out places where PCs are split apart, passed around, and put back > together later. I think this might happen in SPARC's fault code. Add ISA > specific constructors that allow setting PC elements without calling a bunch > of accessors. Try to eliminate the need for the branching() function. Factor > out Alpha's PAL mode pc bit into a separate flag field, and eliminate places > where it's blindly masked out or tested in the PC. > > > Diffs > ----- > > src/arch/alpha/ev5.cc f166f8bd8818 > src/arch/alpha/faults.cc f166f8bd8818 > src/arch/alpha/interrupts.hh f166f8bd8818 > src/arch/alpha/isa/branch.isa f166f8bd8818 > src/arch/alpha/isa/decoder.isa f166f8bd8818 > src/arch/alpha/isa/main.isa f166f8bd8818 > src/arch/alpha/predecoder.hh f166f8bd8818 > src/arch/alpha/process.cc f166f8bd8818 > src/arch/alpha/remote_gdb.cc f166f8bd8818 > src/arch/alpha/stacktrace.cc f166f8bd8818 > src/arch/alpha/tlb.cc f166f8bd8818 > src/arch/alpha/types.hh f166f8bd8818 > src/arch/alpha/utility.hh f166f8bd8818 > src/arch/alpha/utility.cc f166f8bd8818 > src/arch/arm/faults.cc f166f8bd8818 > src/arch/arm/insts/macromem.hh f166f8bd8818 > src/arch/arm/insts/mem.hh f166f8bd8818 > src/arch/arm/insts/pred_inst.hh f166f8bd8818 > src/arch/arm/insts/static_inst.hh f166f8bd8818 > src/arch/arm/insts/vfp.hh f166f8bd8818 > src/arch/arm/isa.cc f166f8bd8818 > src/arch/arm/isa/formats/breakpoint.isa f166f8bd8818 > src/arch/arm/isa/insts/branch.isa f166f8bd8818 > src/arch/arm/isa/insts/data.isa f166f8bd8818 > src/arch/arm/isa/insts/ldr.isa f166f8bd8818 > src/arch/arm/isa/insts/macromem.isa f166f8bd8818 > src/arch/arm/isa/insts/misc.isa f166f8bd8818 > src/arch/arm/isa/operands.isa f166f8bd8818 > src/arch/arm/isa_traits.hh f166f8bd8818 > src/arch/arm/linux/system.cc f166f8bd8818 > src/arch/arm/nativetrace.cc f166f8bd8818 > src/arch/arm/predecoder.hh f166f8bd8818 > src/arch/arm/predecoder.cc f166f8bd8818 > src/arch/arm/process.cc f166f8bd8818 > src/arch/arm/system.hh f166f8bd8818 > src/arch/arm/table_walker.cc f166f8bd8818 > src/arch/arm/tlb.cc f166f8bd8818 > src/arch/arm/types.hh f166f8bd8818 > src/arch/arm/utility.hh f166f8bd8818 > src/arch/arm/utility.cc f166f8bd8818 > src/arch/generic/types.hh PRE-CREATION > src/arch/isa_parser.py f166f8bd8818 > src/arch/mips/isa/base.isa f166f8bd8818 > src/arch/mips/isa/decoder.isa f166f8bd8818 > src/arch/mips/isa/formats/branch.isa f166f8bd8818 > src/arch/mips/isa/includes.isa f166f8bd8818 > src/arch/mips/isa/operands.isa f166f8bd8818 > src/arch/mips/mt.hh f166f8bd8818 > src/arch/mips/predecoder.hh f166f8bd8818 > src/arch/mips/process.cc f166f8bd8818 > src/arch/mips/types.hh f166f8bd8818 > src/arch/mips/utility.hh f166f8bd8818 > src/arch/mips/utility.cc f166f8bd8818 > src/arch/power/insts/branch.hh f166f8bd8818 > src/arch/power/insts/branch.cc f166f8bd8818 > src/arch/power/insts/static_inst.hh f166f8bd8818 > src/arch/power/isa/decoder.isa f166f8bd8818 > src/arch/power/isa/formats/branch.isa f166f8bd8818 > src/arch/power/isa/formats/unknown.isa f166f8bd8818 > src/arch/power/isa/operands.isa f166f8bd8818 > src/arch/power/predecoder.hh f166f8bd8818 > src/arch/power/process.cc f166f8bd8818 > src/arch/power/types.hh f166f8bd8818 > src/arch/power/utility.hh f166f8bd8818 > src/arch/power/utility.cc f166f8bd8818 > src/arch/sparc/faults.cc f166f8bd8818 > src/arch/sparc/isa/base.isa f166f8bd8818 > src/arch/sparc/isa/decoder.isa f166f8bd8818 > src/arch/sparc/isa/formats/branch.isa f166f8bd8818 > src/arch/sparc/isa/formats/micro.isa f166f8bd8818 > src/arch/sparc/isa/operands.isa f166f8bd8818 > src/arch/sparc/nativetrace.cc f166f8bd8818 > src/arch/sparc/predecoder.hh f166f8bd8818 > src/arch/sparc/process.cc f166f8bd8818 > src/arch/sparc/remote_gdb.cc f166f8bd8818 > src/arch/sparc/types.hh f166f8bd8818 > src/arch/sparc/utility.hh f166f8bd8818 > src/arch/sparc/utility.cc f166f8bd8818 > src/arch/x86/faults.cc f166f8bd8818 > src/arch/x86/insts/macroop.hh f166f8bd8818 > src/arch/x86/insts/microop.hh f166f8bd8818 > src/arch/x86/insts/static_inst.hh f166f8bd8818 > src/arch/x86/isa/decoder/two_byte_opcodes.isa f166f8bd8818 > src/arch/x86/isa/formats/unknown.isa f166f8bd8818 > src/arch/x86/isa/microops/regop.isa f166f8bd8818 > src/arch/x86/isa/microops/seqop.isa f166f8bd8818 > src/arch/x86/isa/operands.isa f166f8bd8818 > src/arch/x86/nativetrace.cc f166f8bd8818 > src/arch/x86/predecoder.hh f166f8bd8818 > src/arch/x86/process.cc f166f8bd8818 > src/arch/x86/system.cc f166f8bd8818 > src/arch/x86/tlb.cc f166f8bd8818 > src/arch/x86/types.hh f166f8bd8818 > src/arch/x86/utility.hh f166f8bd8818 > src/arch/x86/utility.cc f166f8bd8818 > src/base/remote_gdb.cc f166f8bd8818 > src/base/types.hh f166f8bd8818 > src/cpu/base_dyn_inst.hh f166f8bd8818 > src/cpu/base_dyn_inst_impl.hh f166f8bd8818 > src/cpu/checker/cpu.hh f166f8bd8818 > src/cpu/exetrace.hh f166f8bd8818 > src/cpu/exetrace.cc f166f8bd8818 > src/cpu/inorder/comm.hh f166f8bd8818 > src/cpu/inorder/cpu.hh f166f8bd8818 > src/cpu/inorder/cpu.cc f166f8bd8818 > src/cpu/inorder/first_stage.cc f166f8bd8818 > src/cpu/inorder/inorder_dyn_inst.hh f166f8bd8818 > src/cpu/inorder/inorder_dyn_inst.cc f166f8bd8818 > src/cpu/inorder/inorder_trace.hh f166f8bd8818 > src/cpu/inorder/inorder_trace.cc f166f8bd8818 > src/cpu/inorder/pipeline_stage.hh f166f8bd8818 > src/cpu/inorder/pipeline_stage.cc f166f8bd8818 > src/cpu/inorder/resources/bpred_unit.hh f166f8bd8818 > src/cpu/inorder/resources/bpred_unit.cc f166f8bd8818 > src/cpu/inorder/resources/branch_predictor.cc f166f8bd8818 > src/cpu/inorder/resources/cache_unit.cc f166f8bd8818 > src/cpu/inorder/resources/execution_unit.cc f166f8bd8818 > src/cpu/inorder/resources/fetch_seq_unit.hh f166f8bd8818 > src/cpu/inorder/resources/fetch_seq_unit.cc f166f8bd8818 > src/cpu/inorder/resources/tlb_unit.hh f166f8bd8818 > src/cpu/inorder/thread_context.hh f166f8bd8818 > src/cpu/inorder/thread_context.cc f166f8bd8818 > src/cpu/inorder/thread_state.hh f166f8bd8818 > src/cpu/inteltrace.hh f166f8bd8818 > src/cpu/inteltrace.cc f166f8bd8818 > src/cpu/legiontrace.hh f166f8bd8818 > src/cpu/legiontrace.cc f166f8bd8818 > src/cpu/nativetrace.hh f166f8bd8818 > src/cpu/o3/bpred_unit.hh f166f8bd8818 > src/cpu/o3/bpred_unit_impl.hh f166f8bd8818 > src/cpu/o3/comm.hh f166f8bd8818 > src/cpu/o3/commit.hh f166f8bd8818 > src/cpu/o3/commit_impl.hh f166f8bd8818 > src/cpu/o3/cpu.hh f166f8bd8818 > src/cpu/o3/cpu.cc f166f8bd8818 > src/cpu/o3/decode_impl.hh f166f8bd8818 > src/cpu/o3/dep_graph.hh f166f8bd8818 > src/cpu/o3/dyn_inst.hh f166f8bd8818 > src/cpu/o3/dyn_inst_impl.hh f166f8bd8818 > src/cpu/o3/fetch.hh f166f8bd8818 > src/cpu/o3/fetch_impl.hh f166f8bd8818 > src/cpu/o3/iew_impl.hh f166f8bd8818 > src/cpu/o3/inst_queue_impl.hh f166f8bd8818 > src/cpu/o3/lsq_unit.hh f166f8bd8818 > src/cpu/o3/lsq_unit_impl.hh f166f8bd8818 > src/cpu/o3/mem_dep_unit_impl.hh f166f8bd8818 > src/cpu/o3/rename_impl.hh f166f8bd8818 > src/cpu/o3/rob_impl.hh f166f8bd8818 > src/cpu/o3/thread_context.hh f166f8bd8818 > src/cpu/o3/thread_context_impl.hh f166f8bd8818 > src/cpu/pc_event.cc f166f8bd8818 > src/cpu/pred/btb.hh f166f8bd8818 > src/cpu/pred/btb.cc f166f8bd8818 > src/cpu/pred/ras.hh f166f8bd8818 > src/cpu/pred/ras.cc f166f8bd8818 > src/cpu/simple/atomic.cc f166f8bd8818 > src/cpu/simple/base.hh f166f8bd8818 > src/cpu/simple/base.cc f166f8bd8818 > src/cpu/simple/timing.cc f166f8bd8818 > src/cpu/simple_thread.hh f166f8bd8818 > src/cpu/simple_thread.cc f166f8bd8818 > src/cpu/static_inst.hh f166f8bd8818 > src/cpu/static_inst.cc f166f8bd8818 > src/cpu/thread_context.hh f166f8bd8818 > src/cpu/thread_context.cc f166f8bd8818 > src/kern/system_events.cc f166f8bd8818 > src/kern/tru64/tru64.hh f166f8bd8818 > src/sim/faults.cc f166f8bd8818 > src/sim/insttracer.hh f166f8bd8818 > src/sim/syscall_emul.hh f166f8bd8818 > src/sim/syscall_emul.cc f166f8bd8818 > > Diff: http://reviews.m5sim.org/r/255/diff > > > Testing > ------- > > Quick regressions > At various times long regressions > Running all regressions now > Trace output while debugging > > > Thanks, > > Gabe > > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
