-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/255/#review424
-----------------------------------------------------------



src/arch/arm/faults.cc
<http://reviews.m5sim.org/r/255/#comment632>

    This doesn't look light.... SCTLR.ThumbExceptions isn't ever checked


- Ali


On 2010-10-27 21:48:56, Gabe Black wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/255/
> -----------------------------------------------------------
> 
> (Updated 2010-10-27 21:48:56)
> 
> 
> Review request for Default.
> 
> 
> Summary
> -------
> 
> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.
> 
> 
> This change is a low level and pervasive reorganization of how PCs are managed
> in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about,
> the PC and the NPC, and the lsb of the PC signaled whether or not you were in
> PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next
> micropc, x86 and ARM introduced variable length instruction sets, and ARM
> started to keep track of mode bits in the PC. Each CPU model handled PCs in
> its own custom way that needed to be updated individually to handle the new
> dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack,
> the complexity could be hidden in the ISA at the ISA implementation's expense.
> Areas like the branch predictor hadn't been updated to handle branch delay
> slots or micropcs, and it turns out that had introduced a significant (10s of
> percent) performance bug in SPARC and to a lesser extend MIPS. Rather than
> perpetuate the problem by reworking O3 again to handle the PC features needed
> by x86, this change was introduced to rework PC handling in a more modular,
> transparent, and hopefully efficient way.
> 
> 
> PC type:
> 
> Rather than having the superset of all possible elements of PC state declared
> in each of the CPU models, each ISA defines its own PCState type which has
> exactly the elements it needs. A cross product of canned PCState classes are
> defined in the new "generic" ISA directory for ISAs with/without delay slots
> and microcode. These are either typedef-ed or subclassed by each ISA. To read
> or write this structure through a *Context, you use the new pcState() accessor
> which reads or writes depending on whether it has an argument. If you just
> want the address of the current or next instruction or the current micro PC,
> you can get those through read-only accessors on either the PCState type or
> the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the
> move away from readPC. That name is ambiguous since it's not clear whether or
> not it should be the actual address to fetch from, or if it should have extra
> bits in it like the PAL mode bit. Each class is free to define its own
> functions to get at whatever values it needs however it needs to to be used in
> ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the
> PC and into a separate field like ARM.
> 
> These types can be reset to a particular pc (where npc = pc +
> sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as
> appropriate), printed, serialized, and compared. There is a branching()
> function which encapsulates code in the CPU models that checked if an
> instruction branched or not. Exactly what that means in the context of branch
> delay slots which can skip an instruction when not taken is ambiguous, and
> ideally this function and its uses can be eliminated. PCStates also generally
> know how to advance themselves in various ways depending on if they point at
> an instruction, a microop, or the last microop of a macroop. More on that
> later.
> 
> Ideally, accessing all the PCs at once when setting them will improve
> performance of M5 even though more data needs to be moved around. This is
> because often all the PCs need to be manipulated together, and by getting them
> all at once you avoid multiple function calls. Also, the PCs of a particular
> thread will have spatial locality in the cache. Previously they were grouped
> by element in arrays which spread out accesses.
> 
> 
> Advancing the PC:
> 
> The PCs were previously managed entirely by the CPU which had to know about PC
> semantics, try to figure out which dimension to increment the PC in, what to
> set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
> with the PC type itself. Because most of the information about how to
> increment the PC (mainly what type of instruction it refers to) is contained
> in the instruction object, a new advancePC virtual function was added to the
> StaticInst class. Subclasses provide an implementation that moves around the
> right element of the PC with a minimal amount of decision making. In ISAs like
> Alpha, the instructions always simply assign NPC to PC without having to worry
> about micropcs, nnpcs, etc. The added cost of a virtual function call should
> be outweighed by not having to figure out as much about what to do with the
> PCs and mucking around with the extra elements.
> 
> One drawback of making the StaticInsts advance the PC is that you have to
> actually have one to advance the PC. This would, superficially, seem to
> require decoding an instruction before fetch could advance. This is, as far as
> I can tell, realistic. fetch would advance through memory addresses, not PCs,
> perhaps predicting new memory addresses using existing ones. More
> sophisticated decisions about control flow would be made later on, after the
> instruction was decoded, and handed back to fetch. If branching needs to
> happen, some amount of decoding needs to happen to see that it's a branch,
> what the target is, etc. This could get a little more complicated if that gets
> done by the predecoder, but I'm choosing to ignore that for now.
> 
> 
> Variable length instructions:
> 
> To handle variable length instructions in x86 and ARM, the predecoder now
> takes in the current PC by reference to the getExtMachInst function. It can
> modify the PC however it needs to (by setting NPC to be the PC + instruction
> length, for instance). This could be improved since the CPU doesn't know if
> the PC was modified and always has to write it back.
> 
> 
> ISA parser:
> 
> To support the new API, all PC related operand types were removed from the
> parser and replaced with a PCState type. There are two warts on this
> implementation. First, as with all the other operand types, the PCState still
> has to have a valid operand type even though it doesn't use it. Second, using
> syntax like PCS.npc(target) doesn't work for two reasons, this looks like the
> syntax for operand type overriding, and the parser can't figure out if you're
> reading or writing. Instructions that use the PCS operand (which I've
> consistently called it) need to first read it into a local variable,
> manipulate it, and then write it back out.
> 
> 
> Return address stack:
> 
> The return address stack needed a little extra help because, in the presence
> of branch delay slots, it has to merge together elements of the return PC and
> the call PC. To handle that, a buildRetPC utility function was added. There
> are basically only two versions in all the ISAs, but it didn't seem short
> enough to put into the generic ISA directory. Also, the branch predictor code
> in O3 and InOrder were adjusted so that they always store the PC of the actual
> call instruction in the RAS, not the next PC. If the call instruction is a
> microop, the next PC refers to the next microop in the same macroop which is
> probably not desirable. The buildRetPC function advances the PC intelligently
> to the next macroop (in an ISA specific way) so that that case works.
> 
> 
> Change in stats:
> 
> There were no change in stats except in MIPS and SPARC in the O3 model. MIPS
> runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could
> likely be improved further by setting call/return instruction flags and taking
> advantage of the RAS.
> 
> 
> TODO:
> 
> Add != operators to the PCState classes, defined trivially to be !(a==b).
> Smooth out places where PCs are split apart, passed around, and put back
> together later. I think this might happen in SPARC's fault code. Add ISA
> specific constructors that allow setting PC elements without calling a bunch
> of accessors. Try to eliminate the need for the branching() function. Factor
> out Alpha's PAL mode pc bit into a separate flag field, and eliminate places
> where it's blindly masked out or tested in the PC.
> 
> 
> Diffs
> -----
> 
>   src/arch/alpha/ev5.cc f166f8bd8818 
>   src/arch/alpha/faults.cc f166f8bd8818 
>   src/arch/alpha/interrupts.hh f166f8bd8818 
>   src/arch/alpha/isa/branch.isa f166f8bd8818 
>   src/arch/alpha/isa/decoder.isa f166f8bd8818 
>   src/arch/alpha/isa/main.isa f166f8bd8818 
>   src/arch/alpha/predecoder.hh f166f8bd8818 
>   src/arch/alpha/process.cc f166f8bd8818 
>   src/arch/alpha/remote_gdb.cc f166f8bd8818 
>   src/arch/alpha/stacktrace.cc f166f8bd8818 
>   src/arch/alpha/tlb.cc f166f8bd8818 
>   src/arch/alpha/types.hh f166f8bd8818 
>   src/arch/alpha/utility.hh f166f8bd8818 
>   src/arch/alpha/utility.cc f166f8bd8818 
>   src/arch/arm/faults.cc f166f8bd8818 
>   src/arch/arm/insts/macromem.hh f166f8bd8818 
>   src/arch/arm/insts/mem.hh f166f8bd8818 
>   src/arch/arm/insts/pred_inst.hh f166f8bd8818 
>   src/arch/arm/insts/static_inst.hh f166f8bd8818 
>   src/arch/arm/insts/vfp.hh f166f8bd8818 
>   src/arch/arm/isa.cc f166f8bd8818 
>   src/arch/arm/isa/formats/breakpoint.isa f166f8bd8818 
>   src/arch/arm/isa/insts/branch.isa f166f8bd8818 
>   src/arch/arm/isa/insts/data.isa f166f8bd8818 
>   src/arch/arm/isa/insts/ldr.isa f166f8bd8818 
>   src/arch/arm/isa/insts/macromem.isa f166f8bd8818 
>   src/arch/arm/isa/insts/misc.isa f166f8bd8818 
>   src/arch/arm/isa/operands.isa f166f8bd8818 
>   src/arch/arm/isa_traits.hh f166f8bd8818 
>   src/arch/arm/linux/system.cc f166f8bd8818 
>   src/arch/arm/nativetrace.cc f166f8bd8818 
>   src/arch/arm/predecoder.hh f166f8bd8818 
>   src/arch/arm/predecoder.cc f166f8bd8818 
>   src/arch/arm/process.cc f166f8bd8818 
>   src/arch/arm/system.hh f166f8bd8818 
>   src/arch/arm/table_walker.cc f166f8bd8818 
>   src/arch/arm/tlb.cc f166f8bd8818 
>   src/arch/arm/types.hh f166f8bd8818 
>   src/arch/arm/utility.hh f166f8bd8818 
>   src/arch/arm/utility.cc f166f8bd8818 
>   src/arch/generic/types.hh PRE-CREATION 
>   src/arch/isa_parser.py f166f8bd8818 
>   src/arch/mips/isa/base.isa f166f8bd8818 
>   src/arch/mips/isa/decoder.isa f166f8bd8818 
>   src/arch/mips/isa/formats/branch.isa f166f8bd8818 
>   src/arch/mips/isa/includes.isa f166f8bd8818 
>   src/arch/mips/isa/operands.isa f166f8bd8818 
>   src/arch/mips/mt.hh f166f8bd8818 
>   src/arch/mips/predecoder.hh f166f8bd8818 
>   src/arch/mips/process.cc f166f8bd8818 
>   src/arch/mips/types.hh f166f8bd8818 
>   src/arch/mips/utility.hh f166f8bd8818 
>   src/arch/mips/utility.cc f166f8bd8818 
>   src/arch/power/insts/branch.hh f166f8bd8818 
>   src/arch/power/insts/branch.cc f166f8bd8818 
>   src/arch/power/insts/static_inst.hh f166f8bd8818 
>   src/arch/power/isa/decoder.isa f166f8bd8818 
>   src/arch/power/isa/formats/branch.isa f166f8bd8818 
>   src/arch/power/isa/formats/unknown.isa f166f8bd8818 
>   src/arch/power/isa/operands.isa f166f8bd8818 
>   src/arch/power/predecoder.hh f166f8bd8818 
>   src/arch/power/process.cc f166f8bd8818 
>   src/arch/power/types.hh f166f8bd8818 
>   src/arch/power/utility.hh f166f8bd8818 
>   src/arch/power/utility.cc f166f8bd8818 
>   src/arch/sparc/faults.cc f166f8bd8818 
>   src/arch/sparc/isa/base.isa f166f8bd8818 
>   src/arch/sparc/isa/decoder.isa f166f8bd8818 
>   src/arch/sparc/isa/formats/branch.isa f166f8bd8818 
>   src/arch/sparc/isa/formats/micro.isa f166f8bd8818 
>   src/arch/sparc/isa/operands.isa f166f8bd8818 
>   src/arch/sparc/nativetrace.cc f166f8bd8818 
>   src/arch/sparc/predecoder.hh f166f8bd8818 
>   src/arch/sparc/process.cc f166f8bd8818 
>   src/arch/sparc/remote_gdb.cc f166f8bd8818 
>   src/arch/sparc/types.hh f166f8bd8818 
>   src/arch/sparc/utility.hh f166f8bd8818 
>   src/arch/sparc/utility.cc f166f8bd8818 
>   src/arch/x86/faults.cc f166f8bd8818 
>   src/arch/x86/insts/macroop.hh f166f8bd8818 
>   src/arch/x86/insts/microop.hh f166f8bd8818 
>   src/arch/x86/insts/static_inst.hh f166f8bd8818 
>   src/arch/x86/isa/decoder/two_byte_opcodes.isa f166f8bd8818 
>   src/arch/x86/isa/formats/unknown.isa f166f8bd8818 
>   src/arch/x86/isa/microops/regop.isa f166f8bd8818 
>   src/arch/x86/isa/microops/seqop.isa f166f8bd8818 
>   src/arch/x86/isa/operands.isa f166f8bd8818 
>   src/arch/x86/nativetrace.cc f166f8bd8818 
>   src/arch/x86/predecoder.hh f166f8bd8818 
>   src/arch/x86/process.cc f166f8bd8818 
>   src/arch/x86/system.cc f166f8bd8818 
>   src/arch/x86/tlb.cc f166f8bd8818 
>   src/arch/x86/types.hh f166f8bd8818 
>   src/arch/x86/utility.hh f166f8bd8818 
>   src/arch/x86/utility.cc f166f8bd8818 
>   src/base/remote_gdb.cc f166f8bd8818 
>   src/base/types.hh f166f8bd8818 
>   src/cpu/base_dyn_inst.hh f166f8bd8818 
>   src/cpu/base_dyn_inst_impl.hh f166f8bd8818 
>   src/cpu/checker/cpu.hh f166f8bd8818 
>   src/cpu/exetrace.hh f166f8bd8818 
>   src/cpu/exetrace.cc f166f8bd8818 
>   src/cpu/inorder/comm.hh f166f8bd8818 
>   src/cpu/inorder/cpu.hh f166f8bd8818 
>   src/cpu/inorder/cpu.cc f166f8bd8818 
>   src/cpu/inorder/first_stage.cc f166f8bd8818 
>   src/cpu/inorder/inorder_dyn_inst.hh f166f8bd8818 
>   src/cpu/inorder/inorder_dyn_inst.cc f166f8bd8818 
>   src/cpu/inorder/inorder_trace.hh f166f8bd8818 
>   src/cpu/inorder/inorder_trace.cc f166f8bd8818 
>   src/cpu/inorder/pipeline_stage.hh f166f8bd8818 
>   src/cpu/inorder/pipeline_stage.cc f166f8bd8818 
>   src/cpu/inorder/resources/bpred_unit.hh f166f8bd8818 
>   src/cpu/inorder/resources/bpred_unit.cc f166f8bd8818 
>   src/cpu/inorder/resources/branch_predictor.cc f166f8bd8818 
>   src/cpu/inorder/resources/cache_unit.cc f166f8bd8818 
>   src/cpu/inorder/resources/execution_unit.cc f166f8bd8818 
>   src/cpu/inorder/resources/fetch_seq_unit.hh f166f8bd8818 
>   src/cpu/inorder/resources/fetch_seq_unit.cc f166f8bd8818 
>   src/cpu/inorder/resources/tlb_unit.hh f166f8bd8818 
>   src/cpu/inorder/thread_context.hh f166f8bd8818 
>   src/cpu/inorder/thread_context.cc f166f8bd8818 
>   src/cpu/inorder/thread_state.hh f166f8bd8818 
>   src/cpu/inteltrace.hh f166f8bd8818 
>   src/cpu/inteltrace.cc f166f8bd8818 
>   src/cpu/legiontrace.hh f166f8bd8818 
>   src/cpu/legiontrace.cc f166f8bd8818 
>   src/cpu/nativetrace.hh f166f8bd8818 
>   src/cpu/o3/bpred_unit.hh f166f8bd8818 
>   src/cpu/o3/bpred_unit_impl.hh f166f8bd8818 
>   src/cpu/o3/comm.hh f166f8bd8818 
>   src/cpu/o3/commit.hh f166f8bd8818 
>   src/cpu/o3/commit_impl.hh f166f8bd8818 
>   src/cpu/o3/cpu.hh f166f8bd8818 
>   src/cpu/o3/cpu.cc f166f8bd8818 
>   src/cpu/o3/decode_impl.hh f166f8bd8818 
>   src/cpu/o3/dep_graph.hh f166f8bd8818 
>   src/cpu/o3/dyn_inst.hh f166f8bd8818 
>   src/cpu/o3/dyn_inst_impl.hh f166f8bd8818 
>   src/cpu/o3/fetch.hh f166f8bd8818 
>   src/cpu/o3/fetch_impl.hh f166f8bd8818 
>   src/cpu/o3/iew_impl.hh f166f8bd8818 
>   src/cpu/o3/inst_queue_impl.hh f166f8bd8818 
>   src/cpu/o3/lsq_unit.hh f166f8bd8818 
>   src/cpu/o3/lsq_unit_impl.hh f166f8bd8818 
>   src/cpu/o3/mem_dep_unit_impl.hh f166f8bd8818 
>   src/cpu/o3/rename_impl.hh f166f8bd8818 
>   src/cpu/o3/rob_impl.hh f166f8bd8818 
>   src/cpu/o3/thread_context.hh f166f8bd8818 
>   src/cpu/o3/thread_context_impl.hh f166f8bd8818 
>   src/cpu/pc_event.cc f166f8bd8818 
>   src/cpu/pred/btb.hh f166f8bd8818 
>   src/cpu/pred/btb.cc f166f8bd8818 
>   src/cpu/pred/ras.hh f166f8bd8818 
>   src/cpu/pred/ras.cc f166f8bd8818 
>   src/cpu/simple/atomic.cc f166f8bd8818 
>   src/cpu/simple/base.hh f166f8bd8818 
>   src/cpu/simple/base.cc f166f8bd8818 
>   src/cpu/simple/timing.cc f166f8bd8818 
>   src/cpu/simple_thread.hh f166f8bd8818 
>   src/cpu/simple_thread.cc f166f8bd8818 
>   src/cpu/static_inst.hh f166f8bd8818 
>   src/cpu/static_inst.cc f166f8bd8818 
>   src/cpu/thread_context.hh f166f8bd8818 
>   src/cpu/thread_context.cc f166f8bd8818 
>   src/kern/system_events.cc f166f8bd8818 
>   src/kern/tru64/tru64.hh f166f8bd8818 
>   src/sim/faults.cc f166f8bd8818 
>   src/sim/insttracer.hh f166f8bd8818 
>   src/sim/syscall_emul.hh f166f8bd8818 
>   src/sim/syscall_emul.cc f166f8bd8818 
> 
> Diff: http://reviews.m5sim.org/r/255/diff
> 
> 
> Testing
> -------
> 
> Quick regressions
> At various times long regressions
> Running all regressions now
> Trace output while debugging
> 
> 
> Thanks,
> 
> Gabe
> 
>

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to