I've fixed a bug or two in ARM which showed up in the long regressions
(I think becaues those are the only ones that use thumb) and one in
ALPHA_FS, so now all the quick and long regressions have passed at some
point. I'm running them all again on this new version just to make sure
I haven't rebroken something. If you haven't yet, please take a look at
this patch or at least read its description. I'm not all that familiar
with checkpointing or, to a lesser extent, our remote gdb support
because I haven't really ever used it, so if someone could verify those
still work I'd appreciate it.

Also, if you want to use postreview to update a review request, what
worked for me is to use -u and then -e with the review request number
which in this case was 255. It clobbers any changes you may have made to
the description on the website, but I expect that wouldn't normally be a
problem.

hg postreview -o -u -e 255

Gabe

Gabe Black wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/255/
> -----------------------------------------------------------
>
> (Updated 2010-09-24 11:18:16.366356)
>
>
> Review request for Default.
>
>
> Summary (updated)
> -------
>
> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.
>
>
> This change is a low level and pervasive reorganization of how PCs are managed
> in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about,
> the PC and the NPC, and the lsb of the PC signaled whether or not you were in
> PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next
> micropc, x86 and ARM introduced variable length instruction sets, and ARM
> started to keep track of mode bits in the PC. Each CPU model handled PCs in
> its own custom way that needed to be updated individually to handle the new
> dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack,
> the complexity could be hidden in the ISA at the ISA implementation's expense.
> Areas like the branch predictor hadn't been updated to handle branch delay
> slots or micropcs, and it turns out that had introduced a significant (10s of
> percent) performance bug in SPARC and to a lesser extend MIPS. Rather than
> perpetuate the problem by reworking O3 again to handle the PC features needed
> by x86, this change was introduced to rework PC handling in a more modular,
> transparent, and hopefully efficient way.
>
>
> PC type:
>
> Rather than having the superset of all possible elements of PC state declared
> in each of the CPU models, each ISA defines its own PCState type which has
> exactly the elements it needs. A cross product of canned PCState classes are
> defined in the new "generic" ISA directory for ISAs with/without delay slots
> and microcode. These are either typedef-ed or subclassed by each ISA. To read
> or write this structure through a *Context, you use the new pcState() accessor
> which reads or writes depending on whether it has an argument. If you just
> want the address of the current or next instruction or the current micro PC,
> you can get those through read-only accessors on either the PCState type or
> the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the
> move away from readPC. That name is ambiguous since it's not clear whether or
> not it should be the actual address to fetch from, or if it should have extra
> bits in it like the PAL mode bit. Each class is free to define its own
> functions to get at whatever values it needs however it needs to to be used in
> ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the
> PC and into a separate field like ARM.
>
> These types can be reset to a particular pc (where npc = pc +
> sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as
> appropriate), printed, serialized, and compared. There is a branching()
> function which encapsulates code in the CPU models that checked if an
> instruction branched or not. Exactly what that means in the context of branch
> delay slots which can skip an instruction when not taken is ambiguous, and
> ideally this function and its uses can be eliminated. PCStates also generally
> know how to advance themselves in various ways depending on if they point at
> an instruction, a microop, or the last microop of a macroop. More on that
> later.
>
> Ideally, accessing all the PCs at once when setting them will improve
> performance of M5 even though more data needs to be moved around. This is
> because often all the PCs need to be manipulated together, and by getting them
> all at once you avoid multiple function calls. Also, the PCs of a particular
> thread will have spatial locality in the cache. Previously they were grouped
> by element in arrays which spread out accesses.
>
>
> Advancing the PC:
>
> The PCs were previously managed entirely by the CPU which had to know about PC
> semantics, try to figure out which dimension to increment the PC in, what to
> set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
> with the PC type itself. Because most of the information about how to
> increment the PC (mainly what type of instruction it refers to) is contained
> in the instruction object, a new advancePC virtual function was added to the
> StaticInst class. Subclasses provide an implementation that moves around the
> right element of the PC with a minimal amount of decision making. In ISAs like
> Alpha, the instructions always simply assign NPC to PC without having to worry
> about micropcs, nnpcs, etc. The added cost of a virtual function call should
> be outweighed by not having to figure out as much about what to do with the
> PCs and mucking around with the extra elements.
>
> One drawback of making the StaticInsts advance the PC is that you have to
> actually have one to advance the PC. This would, superficially, seem to
> require decoding an instruction before fetch could advance. This is, as far as
> I can tell, realistic. fetch would advance through memory addresses, not PCs,
> perhaps predicting new memory addresses using existing ones. More
> sophisticated decisions about control flow would be made later on, after the
> instruction was decoded, and handed back to fetch. If branching needs to
> happen, some amount of decoding needs to happen to see that it's a branch,
> what the target is, etc. This could get a little more complicated if that gets
> done by the predecoder, but I'm choosing to ignore that for now.
>
>
> Variable length instructions:
>
> To handle variable length instructions in x86 and ARM, the predecoder now
> takes in the current by reference to the getExtMachInst function. It can
> modify the PC however it needs to (by setting NPC to be the PC + instruction
> length, for instance). This could be improved since the CPU doesn't know if
> the PC was modified and always has to write it back.
>
>
> ISA parser:
>
> To support the new API, all PC related operand types were removed from the
> parser and replaced with a PCState type. There are two warts on this
> implementation. First, as with all the other operand types, the PCState still
> has to have a valid operand type even though it doesn't use it. Second, using
> syntax like PCS.npc(target) doesn't work for two reasons, this looks like the
> syntax for operand type overriding, and the parser can't figure out if you're
> reading or writing. Instructions that use the PCS operand (which I've
> consistently called it) need to first read it into a local variable,
> manipulate it, and then write it back out.
>
>
> Return address stack:
>
> The return address stack needed a little extra help because, in the presence
> of branch delay slots, it has to merge together elements of the return PC and
> the call PC. To handle that, a buildRetPC utility function was added. There
> are basically only two versions in all the ISAs, but it didn't seem short
> enough to put into the generic ISA directory. Also, the branch predictor code
> in O3 and InOrder were adjusted so that they always store the PC of the actual
> call instruction in the RAS, not the next PC. If the call instruction is a
> microop, the next PC refers to the next microop in the same macroop which is
> probably not desirable. The buildRetPC function advances the PC intelligently
> to the next macroop (in an ISA specific way) so that that case works.
>
>
> Change in stats:
>
> There were no change in stats except in MIPS and SPARC in the O3 model. MIPS
> runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could
> likely be improved further by setting call/return instruction flags and taking
> advantage of the RAS.
>
>
> TODO:
>
> Add != operators to the PCState classes, defined trivially to be !(a==b).
> Smooth out places where PCs are split apart, passed around, and put back
> together later. I think this might happen in SPARC's fault code. Add ISA
> specific constructors that allow setting PC elements without calling a bunch
> of accessors. Try to eliminate the need for the branching() function. Factor
> out Alpha's PAL mode pc bit into a separate flag field, and eliminate places
> where it's blindly masked out or tested in the PC.
>
>
> Diffs (updated)
> -----
>
>   src/arch/alpha/ev5.cc ad9041185274 
>   src/arch/alpha/faults.cc ad9041185274 
>   src/arch/alpha/interrupts.hh ad9041185274 
>   src/arch/alpha/isa/branch.isa ad9041185274 
>   src/arch/alpha/isa/decoder.isa ad9041185274 
>   src/arch/alpha/isa/main.isa ad9041185274 
>   src/arch/alpha/predecoder.hh ad9041185274 
>   src/arch/alpha/process.cc ad9041185274 
>   src/arch/alpha/remote_gdb.cc ad9041185274 
>   src/arch/alpha/stacktrace.cc ad9041185274 
>   src/arch/alpha/tlb.cc ad9041185274 
>   src/arch/alpha/types.hh ad9041185274 
>   src/arch/alpha/utility.hh ad9041185274 
>   src/arch/alpha/utility.cc ad9041185274 
>   src/arch/arm/faults.cc ad9041185274 
>   src/arch/arm/insts/macromem.hh ad9041185274 
>   src/arch/arm/insts/mem.hh ad9041185274 
>   src/arch/arm/insts/pred_inst.hh ad9041185274 
>   src/arch/arm/insts/static_inst.hh ad9041185274 
>   src/arch/arm/insts/vfp.hh ad9041185274 
>   src/arch/arm/isa.cc ad9041185274 
>   src/arch/arm/isa/formats/breakpoint.isa ad9041185274 
>   src/arch/arm/isa/insts/branch.isa ad9041185274 
>   src/arch/arm/isa/insts/data.isa ad9041185274 
>   src/arch/arm/isa/insts/ldr.isa ad9041185274 
>   src/arch/arm/isa/insts/macromem.isa ad9041185274 
>   src/arch/arm/isa/insts/misc.isa ad9041185274 
>   src/arch/arm/isa/operands.isa ad9041185274 
>   src/arch/arm/isa_traits.hh ad9041185274 
>   src/arch/arm/linux/system.cc ad9041185274 
>   src/arch/arm/nativetrace.cc ad9041185274 
>   src/arch/arm/predecoder.hh ad9041185274 
>   src/arch/arm/predecoder.cc ad9041185274 
>   src/arch/arm/process.cc ad9041185274 
>   src/arch/arm/table_walker.cc ad9041185274 
>   src/arch/arm/tlb.cc ad9041185274 
>   src/arch/arm/types.hh ad9041185274 
>   src/arch/arm/utility.hh ad9041185274 
>   src/arch/generic/types.hh PRE-CREATION 
>   src/arch/isa_parser.py ad9041185274 
>   src/arch/mips/isa/base.isa ad9041185274 
>   src/arch/mips/isa/decoder.isa ad9041185274 
>   src/arch/mips/isa/formats/branch.isa ad9041185274 
>   src/arch/mips/isa/includes.isa ad9041185274 
>   src/arch/mips/isa/operands.isa ad9041185274 
>   src/arch/mips/mt.hh ad9041185274 
>   src/arch/mips/predecoder.hh ad9041185274 
>   src/arch/mips/process.cc ad9041185274 
>   src/arch/mips/types.hh ad9041185274 
>   src/arch/mips/utility.hh ad9041185274 
>   src/arch/power/insts/branch.hh ad9041185274 
>   src/arch/power/insts/branch.cc ad9041185274 
>   src/arch/power/insts/static_inst.hh ad9041185274 
>   src/arch/power/isa/decoder.isa ad9041185274 
>   src/arch/power/isa/formats/branch.isa ad9041185274 
>   src/arch/power/isa/formats/unknown.isa ad9041185274 
>   src/arch/power/isa/operands.isa ad9041185274 
>   src/arch/power/predecoder.hh ad9041185274 
>   src/arch/power/process.cc ad9041185274 
>   src/arch/power/types.hh ad9041185274 
>   src/arch/power/utility.hh ad9041185274 
>   src/arch/power/utility.cc ad9041185274 
>   src/arch/sparc/faults.cc ad9041185274 
>   src/arch/sparc/isa/base.isa ad9041185274 
>   src/arch/sparc/isa/decoder.isa ad9041185274 
>   src/arch/sparc/isa/formats/branch.isa ad9041185274 
>   src/arch/sparc/isa/formats/micro.isa ad9041185274 
>   src/arch/sparc/isa/operands.isa ad9041185274 
>   src/arch/sparc/nativetrace.cc ad9041185274 
>   src/arch/sparc/predecoder.hh ad9041185274 
>   src/arch/sparc/process.cc ad9041185274 
>   src/arch/sparc/remote_gdb.cc ad9041185274 
>   src/arch/sparc/types.hh ad9041185274 
>   src/arch/sparc/utility.hh ad9041185274 
>   src/arch/sparc/utility.cc ad9041185274 
>   src/arch/x86/faults.cc ad9041185274 
>   src/arch/x86/insts/macroop.hh ad9041185274 
>   src/arch/x86/insts/microop.hh ad9041185274 
>   src/arch/x86/insts/static_inst.hh ad9041185274 
>   src/arch/x86/isa/decoder/two_byte_opcodes.isa ad9041185274 
>   src/arch/x86/isa/formats/unknown.isa ad9041185274 
>   src/arch/x86/isa/microops/regop.isa ad9041185274 
>   src/arch/x86/isa/microops/seqop.isa ad9041185274 
>   src/arch/x86/isa/operands.isa ad9041185274 
>   src/arch/x86/nativetrace.cc ad9041185274 
>   src/arch/x86/predecoder.hh ad9041185274 
>   src/arch/x86/process.cc ad9041185274 
>   src/arch/x86/system.cc ad9041185274 
>   src/arch/x86/tlb.cc ad9041185274 
>   src/arch/x86/types.hh ad9041185274 
>   src/arch/x86/utility.hh ad9041185274 
>   src/arch/x86/utility.cc ad9041185274 
>   src/base/remote_gdb.cc ad9041185274 
>   src/base/types.hh ad9041185274 
>   src/cpu/base_dyn_inst.hh ad9041185274 
>   src/cpu/base_dyn_inst_impl.hh ad9041185274 
>   src/cpu/checker/cpu.hh ad9041185274 
>   src/cpu/exetrace.hh ad9041185274 
>   src/cpu/exetrace.cc ad9041185274 
>   src/cpu/inorder/comm.hh ad9041185274 
>   src/cpu/inorder/cpu.hh ad9041185274 
>   src/cpu/inorder/cpu.cc ad9041185274 
>   src/cpu/inorder/first_stage.cc ad9041185274 
>   src/cpu/inorder/inorder_dyn_inst.hh ad9041185274 
>   src/cpu/inorder/inorder_dyn_inst.cc ad9041185274 
>   src/cpu/inorder/inorder_trace.hh ad9041185274 
>   src/cpu/inorder/inorder_trace.cc ad9041185274 
>   src/cpu/inorder/pipeline_stage.hh ad9041185274 
>   src/cpu/inorder/pipeline_stage.cc ad9041185274 
>   src/cpu/inorder/resource_pool.9stage.cc ad9041185274 
>   src/cpu/inorder/resources/bpred_unit.hh ad9041185274 
>   src/cpu/inorder/resources/bpred_unit.cc ad9041185274 
>   src/cpu/inorder/resources/branch_predictor.cc ad9041185274 
>   src/cpu/inorder/resources/cache_unit.cc ad9041185274 
>   src/cpu/inorder/resources/execution_unit.cc ad9041185274 
>   src/cpu/inorder/resources/fetch_seq_unit.hh ad9041185274 
>   src/cpu/inorder/resources/fetch_seq_unit.cc ad9041185274 
>   src/cpu/inorder/resources/tlb_unit.hh ad9041185274 
>   src/cpu/inorder/thread_context.hh ad9041185274 
>   src/cpu/inorder/thread_context.cc ad9041185274 
>   src/cpu/inorder/thread_state.hh ad9041185274 
>   src/cpu/inteltrace.hh ad9041185274 
>   src/cpu/inteltrace.cc ad9041185274 
>   src/cpu/legiontrace.hh ad9041185274 
>   src/cpu/legiontrace.cc ad9041185274 
>   src/cpu/nativetrace.hh ad9041185274 
>   src/cpu/o3/bpred_unit.hh ad9041185274 
>   src/cpu/o3/bpred_unit_impl.hh ad9041185274 
>   src/cpu/o3/comm.hh ad9041185274 
>   src/cpu/o3/commit.hh ad9041185274 
>   src/cpu/o3/commit_impl.hh ad9041185274 
>   src/cpu/o3/cpu.hh ad9041185274 
>   src/cpu/o3/cpu.cc ad9041185274 
>   src/cpu/o3/decode_impl.hh ad9041185274 
>   src/cpu/o3/dep_graph.hh ad9041185274 
>   src/cpu/o3/dyn_inst.hh ad9041185274 
>   src/cpu/o3/dyn_inst_impl.hh ad9041185274 
>   src/cpu/o3/fetch.hh ad9041185274 
>   src/cpu/o3/fetch_impl.hh ad9041185274 
>   src/cpu/o3/iew_impl.hh ad9041185274 
>   src/cpu/o3/inst_queue_impl.hh ad9041185274 
>   src/cpu/o3/lsq_unit.hh ad9041185274 
>   src/cpu/o3/lsq_unit_impl.hh ad9041185274 
>   src/cpu/o3/mem_dep_unit_impl.hh ad9041185274 
>   src/cpu/o3/rename_impl.hh ad9041185274 
>   src/cpu/o3/rob_impl.hh ad9041185274 
>   src/cpu/o3/thread_context.hh ad9041185274 
>   src/cpu/o3/thread_context_impl.hh ad9041185274 
>   src/cpu/pc_event.cc ad9041185274 
>   src/cpu/pred/btb.hh ad9041185274 
>   src/cpu/pred/btb.cc ad9041185274 
>   src/cpu/pred/ras.hh ad9041185274 
>   src/cpu/pred/ras.cc ad9041185274 
>   src/cpu/simple/atomic.cc ad9041185274 
>   src/cpu/simple/base.hh ad9041185274 
>   src/cpu/simple/base.cc ad9041185274 
>   src/cpu/simple/timing.cc ad9041185274 
>   src/cpu/simple_thread.hh ad9041185274 
>   src/cpu/simple_thread.cc ad9041185274 
>   src/cpu/static_inst.hh ad9041185274 
>   src/cpu/static_inst.cc ad9041185274 
>   src/cpu/thread_context.hh ad9041185274 
>   src/cpu/thread_context.cc ad9041185274 
>   src/kern/system_events.cc ad9041185274 
>   src/kern/tru64/tru64.hh ad9041185274 
>   src/sim/faults.cc ad9041185274 
>   src/sim/insttracer.hh ad9041185274 
>   src/sim/syscall_emul.hh ad9041185274 
>   src/sim/syscall_emul.cc ad9041185274 
>
> Diff: http://reviews.m5sim.org/r/255/diff
>
>
> Testing
> -------
>
> Quick regressions
> At various times long regressions
> Running all regressions now
> Trace output while debugging
>
>
> Thanks,
>
> Gabe
>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>   

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to