It looks like I confused reviewboard. My replies are there, though, if
you follow the link.

Gabe

On 10/24/10 00:30, Gabe Black wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/255/#review413
> -----------------------------------------------------------
>
>
> - Gabe
>
>
> On 2010-10-14 00:01:15, Gabe Black wrote:
>> -----------------------------------------------------------
>> This is an automatically generated e-mail. To reply, visit:
>> http://reviews.m5sim.org/r/255/
>> -----------------------------------------------------------
>>
>> (Updated 2010-10-14 00:01:15)
>>
>>
>> Review request for Default.
>>
>>
>> Summary
>> -------
>>
>> ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.
>>
>>
>> This change is a low level and pervasive reorganization of how PCs are 
>> managed
>> in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry 
>> about,
>> the PC and the NPC, and the lsb of the PC signaled whether or not you were in
>> PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next
>> micropc, x86 and ARM introduced variable length instruction sets, and ARM
>> started to keep track of mode bits in the PC. Each CPU model handled PCs in
>> its own custom way that needed to be updated individually to handle the new
>> dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack,
>> the complexity could be hidden in the ISA at the ISA implementation's 
>> expense.
>> Areas like the branch predictor hadn't been updated to handle branch delay
>> slots or micropcs, and it turns out that had introduced a significant (10s of
>> percent) performance bug in SPARC and to a lesser extend MIPS. Rather than
>> perpetuate the problem by reworking O3 again to handle the PC features needed
>> by x86, this change was introduced to rework PC handling in a more modular,
>> transparent, and hopefully efficient way.
>>
>>
>> PC type:
>>
>> Rather than having the superset of all possible elements of PC state declared
>> in each of the CPU models, each ISA defines its own PCState type which has
>> exactly the elements it needs. A cross product of canned PCState classes are
>> defined in the new "generic" ISA directory for ISAs with/without delay slots
>> and microcode. These are either typedef-ed or subclassed by each ISA. To read
>> or write this structure through a *Context, you use the new pcState() 
>> accessor
>> which reads or writes depending on whether it has an argument. If you just
>> want the address of the current or next instruction or the current micro PC,
>> you can get those through read-only accessors on either the PCState type or
>> the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the
>> move away from readPC. That name is ambiguous since it's not clear whether or
>> not it should be the actual address to fetch from, or if it should have extra
>> bits in it like the PAL mode bit. Each class is free to define its own
>> functions to get at whatever values it needs however it needs to to be used 
>> in
>> ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the
>> PC and into a separate field like ARM.
>>
>> These types can be reset to a particular pc (where npc = pc +
>> sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as
>> appropriate), printed, serialized, and compared. There is a branching()
>> function which encapsulates code in the CPU models that checked if an
>> instruction branched or not. Exactly what that means in the context of branch
>> delay slots which can skip an instruction when not taken is ambiguous, and
>> ideally this function and its uses can be eliminated. PCStates also generally
>> know how to advance themselves in various ways depending on if they point at
>> an instruction, a microop, or the last microop of a macroop. More on that
>> later.
>>
>> Ideally, accessing all the PCs at once when setting them will improve
>> performance of M5 even though more data needs to be moved around. This is
>> because often all the PCs need to be manipulated together, and by getting 
>> them
>> all at once you avoid multiple function calls. Also, the PCs of a particular
>> thread will have spatial locality in the cache. Previously they were grouped
>> by element in arrays which spread out accesses.
>>
>>
>> Advancing the PC:
>>
>> The PCs were previously managed entirely by the CPU which had to know about 
>> PC
>> semantics, try to figure out which dimension to increment the PC in, what to
>> set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
>> with the PC type itself. Because most of the information about how to
>> increment the PC (mainly what type of instruction it refers to) is contained
>> in the instruction object, a new advancePC virtual function was added to the
>> StaticInst class. Subclasses provide an implementation that moves around the
>> right element of the PC with a minimal amount of decision making. In ISAs 
>> like
>> Alpha, the instructions always simply assign NPC to PC without having to 
>> worry
>> about micropcs, nnpcs, etc. The added cost of a virtual function call should
>> be outweighed by not having to figure out as much about what to do with the
>> PCs and mucking around with the extra elements.
>>
>> One drawback of making the StaticInsts advance the PC is that you have to
>> actually have one to advance the PC. This would, superficially, seem to
>> require decoding an instruction before fetch could advance. This is, as far 
>> as
>> I can tell, realistic. fetch would advance through memory addresses, not PCs,
>> perhaps predicting new memory addresses using existing ones. More
>> sophisticated decisions about control flow would be made later on, after the
>> instruction was decoded, and handed back to fetch. If branching needs to
>> happen, some amount of decoding needs to happen to see that it's a branch,
>> what the target is, etc. This could get a little more complicated if that 
>> gets
>> done by the predecoder, but I'm choosing to ignore that for now.
>>
>>
>> Variable length instructions:
>>
>> To handle variable length instructions in x86 and ARM, the predecoder now
>> takes in the current by reference to the getExtMachInst function. It can
>> modify the PC however it needs to (by setting NPC to be the PC + instruction
>> length, for instance). This could be improved since the CPU doesn't know if
>> the PC was modified and always has to write it back.
>>
>>
>> ISA parser:
>>
>> To support the new API, all PC related operand types were removed from the
>> parser and replaced with a PCState type. There are two warts on this
>> implementation. First, as with all the other operand types, the PCState still
>> has to have a valid operand type even though it doesn't use it. Second, using
>> syntax like PCS.npc(target) doesn't work for two reasons, this looks like the
>> syntax for operand type overriding, and the parser can't figure out if you're
>> reading or writing. Instructions that use the PCS operand (which I've
>> consistently called it) need to first read it into a local variable,
>> manipulate it, and then write it back out.
>>
>>
>> Return address stack:
>>
>> The return address stack needed a little extra help because, in the presence
>> of branch delay slots, it has to merge together elements of the return PC and
>> the call PC. To handle that, a buildRetPC utility function was added. There
>> are basically only two versions in all the ISAs, but it didn't seem short
>> enough to put into the generic ISA directory. Also, the branch predictor code
>> in O3 and InOrder were adjusted so that they always store the PC of the 
>> actual
>> call instruction in the RAS, not the next PC. If the call instruction is a
>> microop, the next PC refers to the next microop in the same macroop which is
>> probably not desirable. The buildRetPC function advances the PC intelligently
>> to the next macroop (in an ISA specific way) so that that case works.
>>
>>
>> Change in stats:
>>
>> There were no change in stats except in MIPS and SPARC in the O3 model. MIPS
>> runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which 
>> could
>> likely be improved further by setting call/return instruction flags and 
>> taking
>> advantage of the RAS.
>>
>>
>> TODO:
>>
>> Add != operators to the PCState classes, defined trivially to be !(a==b).
>> Smooth out places where PCs are split apart, passed around, and put back
>> together later. I think this might happen in SPARC's fault code. Add ISA
>> specific constructors that allow setting PC elements without calling a bunch
>> of accessors. Try to eliminate the need for the branching() function. Factor
>> out Alpha's PAL mode pc bit into a separate flag field, and eliminate places
>> where it's blindly masked out or tested in the PC.
>>
>>
>> Diffs
>> -----
>>
>>   src/arch/alpha/ev5.cc addb847910d2 
>>   src/arch/alpha/faults.cc addb847910d2 
>>   src/arch/alpha/interrupts.hh addb847910d2 
>>   src/arch/alpha/isa/branch.isa addb847910d2 
>>   src/arch/alpha/isa/decoder.isa addb847910d2 
>>   src/arch/alpha/isa/main.isa addb847910d2 
>>   src/arch/alpha/predecoder.hh addb847910d2 
>>   src/arch/alpha/process.cc addb847910d2 
>>   src/arch/alpha/remote_gdb.cc addb847910d2 
>>   src/arch/alpha/stacktrace.cc addb847910d2 
>>   src/arch/alpha/tlb.cc addb847910d2 
>>   src/arch/alpha/types.hh addb847910d2 
>>   src/arch/alpha/utility.hh addb847910d2 
>>   src/arch/alpha/utility.cc addb847910d2 
>>   src/arch/arm/faults.cc addb847910d2 
>>   src/arch/arm/insts/macromem.hh addb847910d2 
>>   src/arch/arm/insts/mem.hh addb847910d2 
>>   src/arch/arm/insts/pred_inst.hh addb847910d2 
>>   src/arch/arm/insts/static_inst.hh addb847910d2 
>>   src/arch/arm/insts/vfp.hh addb847910d2 
>>   src/arch/arm/isa.cc addb847910d2 
>>   src/arch/arm/isa/formats/breakpoint.isa addb847910d2 
>>   src/arch/arm/isa/insts/branch.isa addb847910d2 
>>   src/arch/arm/isa/insts/data.isa addb847910d2 
>>   src/arch/arm/isa/insts/ldr.isa addb847910d2 
>>   src/arch/arm/isa/insts/macromem.isa addb847910d2 
>>   src/arch/arm/isa/insts/misc.isa addb847910d2 
>>   src/arch/arm/isa/operands.isa addb847910d2 
>>   src/arch/arm/isa_traits.hh addb847910d2 
>>   src/arch/arm/linux/system.cc addb847910d2 
>>   src/arch/arm/nativetrace.cc addb847910d2 
>>   src/arch/arm/predecoder.hh addb847910d2 
>>   src/arch/arm/predecoder.cc addb847910d2 
>>   src/arch/arm/process.cc addb847910d2 
>>   src/arch/arm/system.hh addb847910d2 
>>   src/arch/arm/table_walker.cc addb847910d2 
>>   src/arch/arm/tlb.cc addb847910d2 
>>   src/arch/arm/types.hh addb847910d2 
>>   src/arch/arm/utility.hh addb847910d2 
>>   src/arch/arm/utility.cc addb847910d2 
>>   src/arch/generic/types.hh PRE-CREATION 
>>   src/arch/isa_parser.py addb847910d2 
>>   src/arch/mips/isa/base.isa addb847910d2 
>>   src/arch/mips/isa/decoder.isa addb847910d2 
>>   src/arch/mips/isa/formats/branch.isa addb847910d2 
>>   src/arch/mips/isa/includes.isa addb847910d2 
>>   src/arch/mips/isa/operands.isa addb847910d2 
>>   src/arch/mips/mt.hh addb847910d2 
>>   src/arch/mips/predecoder.hh addb847910d2 
>>   src/arch/mips/process.cc addb847910d2 
>>   src/arch/mips/types.hh addb847910d2 
>>   src/arch/mips/utility.hh addb847910d2 
>>   src/arch/mips/utility.cc addb847910d2 
>>   src/arch/power/insts/branch.hh addb847910d2 
>>   src/arch/power/insts/branch.cc addb847910d2 
>>   src/arch/power/insts/static_inst.hh addb847910d2 
>>   src/arch/power/isa/decoder.isa addb847910d2 
>>   src/arch/power/isa/formats/branch.isa addb847910d2 
>>   src/arch/power/isa/formats/unknown.isa addb847910d2 
>>   src/arch/power/isa/operands.isa addb847910d2 
>>   src/arch/power/predecoder.hh addb847910d2 
>>   src/arch/power/process.cc addb847910d2 
>>   src/arch/power/types.hh addb847910d2 
>>   src/arch/power/utility.hh addb847910d2 
>>   src/arch/power/utility.cc addb847910d2 
>>   src/arch/sparc/faults.cc addb847910d2 
>>   src/arch/sparc/isa/base.isa addb847910d2 
>>   src/arch/sparc/isa/decoder.isa addb847910d2 
>>   src/arch/sparc/isa/formats/branch.isa addb847910d2 
>>   src/arch/sparc/isa/formats/micro.isa addb847910d2 
>>   src/arch/sparc/isa/operands.isa addb847910d2 
>>   src/arch/sparc/nativetrace.cc addb847910d2 
>>   src/arch/sparc/predecoder.hh addb847910d2 
>>   src/arch/sparc/process.cc addb847910d2 
>>   src/arch/sparc/remote_gdb.cc addb847910d2 
>>   src/arch/sparc/types.hh addb847910d2 
>>   src/arch/sparc/utility.hh addb847910d2 
>>   src/arch/sparc/utility.cc addb847910d2 
>>   src/arch/x86/faults.cc addb847910d2 
>>   src/arch/x86/insts/macroop.hh addb847910d2 
>>   src/arch/x86/insts/microop.hh addb847910d2 
>>   src/arch/x86/insts/static_inst.hh addb847910d2 
>>   src/arch/x86/isa/decoder/two_byte_opcodes.isa addb847910d2 
>>   src/arch/x86/isa/formats/unknown.isa addb847910d2 
>>   src/arch/x86/isa/microops/regop.isa addb847910d2 
>>   src/arch/x86/isa/microops/seqop.isa addb847910d2 
>>   src/arch/x86/isa/operands.isa addb847910d2 
>>   src/arch/x86/nativetrace.cc addb847910d2 
>>   src/arch/x86/predecoder.hh addb847910d2 
>>   src/arch/x86/process.cc addb847910d2 
>>   src/arch/x86/system.cc addb847910d2 
>>   src/arch/x86/tlb.cc addb847910d2 
>>   src/arch/x86/types.hh addb847910d2 
>>   src/arch/x86/utility.hh addb847910d2 
>>   src/arch/x86/utility.cc addb847910d2 
>>   src/base/remote_gdb.cc addb847910d2 
>>   src/base/types.hh addb847910d2 
>>   src/cpu/base_dyn_inst.hh addb847910d2 
>>   src/cpu/base_dyn_inst_impl.hh addb847910d2 
>>   src/cpu/checker/cpu.hh addb847910d2 
>>   src/cpu/exetrace.hh addb847910d2 
>>   src/cpu/exetrace.cc addb847910d2 
>>   src/cpu/inorder/comm.hh addb847910d2 
>>   src/cpu/inorder/cpu.hh addb847910d2 
>>   src/cpu/inorder/cpu.cc addb847910d2 
>>   src/cpu/inorder/first_stage.cc addb847910d2 
>>   src/cpu/inorder/inorder_dyn_inst.hh addb847910d2 
>>   src/cpu/inorder/inorder_dyn_inst.cc addb847910d2 
>>   src/cpu/inorder/inorder_trace.hh addb847910d2 
>>   src/cpu/inorder/inorder_trace.cc addb847910d2 
>>   src/cpu/inorder/pipeline_stage.hh addb847910d2 
>>   src/cpu/inorder/pipeline_stage.cc addb847910d2 
>>   src/cpu/inorder/resource_pool.9stage.cc addb847910d2 
>>   src/cpu/inorder/resources/bpred_unit.hh addb847910d2 
>>   src/cpu/inorder/resources/bpred_unit.cc addb847910d2 
>>   src/cpu/inorder/resources/branch_predictor.cc addb847910d2 
>>   src/cpu/inorder/resources/cache_unit.cc addb847910d2 
>>   src/cpu/inorder/resources/execution_unit.cc addb847910d2 
>>   src/cpu/inorder/resources/fetch_seq_unit.hh addb847910d2 
>>   src/cpu/inorder/resources/fetch_seq_unit.cc addb847910d2 
>>   src/cpu/inorder/resources/tlb_unit.hh addb847910d2 
>>   src/cpu/inorder/thread_context.hh addb847910d2 
>>   src/cpu/inorder/thread_context.cc addb847910d2 
>>   src/cpu/inorder/thread_state.hh addb847910d2 
>>   src/cpu/inteltrace.hh addb847910d2 
>>   src/cpu/inteltrace.cc addb847910d2 
>>   src/cpu/legiontrace.hh addb847910d2 
>>   src/cpu/legiontrace.cc addb847910d2 
>>   src/cpu/nativetrace.hh addb847910d2 
>>   src/cpu/o3/bpred_unit.hh addb847910d2 
>>   src/cpu/o3/bpred_unit_impl.hh addb847910d2 
>>   src/cpu/o3/comm.hh addb847910d2 
>>   src/cpu/o3/commit.hh addb847910d2 
>>   src/cpu/o3/commit_impl.hh addb847910d2 
>>   src/cpu/o3/cpu.hh addb847910d2 
>>   src/cpu/o3/cpu.cc addb847910d2 
>>   src/cpu/o3/decode_impl.hh addb847910d2 
>>   src/cpu/o3/dep_graph.hh addb847910d2 
>>   src/cpu/o3/dyn_inst.hh addb847910d2 
>>   src/cpu/o3/dyn_inst_impl.hh addb847910d2 
>>   src/cpu/o3/fetch.hh addb847910d2 
>>   src/cpu/o3/fetch_impl.hh addb847910d2 
>>   src/cpu/o3/iew_impl.hh addb847910d2 
>>   src/cpu/o3/inst_queue_impl.hh addb847910d2 
>>   src/cpu/o3/lsq_unit.hh addb847910d2 
>>   src/cpu/o3/lsq_unit_impl.hh addb847910d2 
>>   src/cpu/o3/mem_dep_unit_impl.hh addb847910d2 
>>   src/cpu/o3/rename_impl.hh addb847910d2 
>>   src/cpu/o3/rob_impl.hh addb847910d2 
>>   src/cpu/o3/thread_context.hh addb847910d2 
>>   src/cpu/o3/thread_context_impl.hh addb847910d2 
>>   src/cpu/pc_event.cc addb847910d2 
>>   src/cpu/pred/btb.hh addb847910d2 
>>   src/cpu/pred/btb.cc addb847910d2 
>>   src/cpu/pred/ras.hh addb847910d2 
>>   src/cpu/pred/ras.cc addb847910d2 
>>   src/cpu/simple/atomic.cc addb847910d2 
>>   src/cpu/simple/base.hh addb847910d2 
>>   src/cpu/simple/base.cc addb847910d2 
>>   src/cpu/simple/timing.cc addb847910d2 
>>   src/cpu/simple_thread.hh addb847910d2 
>>   src/cpu/simple_thread.cc addb847910d2 
>>   src/cpu/static_inst.hh addb847910d2 
>>   src/cpu/static_inst.cc addb847910d2 
>>   src/cpu/thread_context.hh addb847910d2 
>>   src/cpu/thread_context.cc addb847910d2 
>>   src/kern/system_events.cc addb847910d2 
>>   src/kern/tru64/tru64.hh addb847910d2 
>>   src/sim/arguments.hh addb847910d2 
>>   src/sim/faults.cc addb847910d2 
>>   src/sim/insttracer.hh addb847910d2 
>>   src/sim/syscall_emul.hh addb847910d2 
>>   src/sim/syscall_emul.cc addb847910d2 
>>
>> Diff: http://reviews.m5sim.org/r/255/diff
>>
>>
>> Testing
>> -------
>>
>> Quick regressions
>> At various times long regressions
>> Running all regressions now
>> Trace output while debugging
>>
>>
>> Thanks,
>>
>> Gabe
>>
>>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to