Here's more or less what's going on as far as the register index. The
load microop needs to store into register 1, and it needs to be sure it
stores into the version visible from the "user" mode. It does that by
applying the intRegInMode function which shifts the register index 1 by
MODE_USER * the number of integer registers. Later, the flattenIntIndex
function is called to unambiguously figure out what a particular
register index goes with given the current values of various ISA state
(specifically the CPU mode for ARM) or other conditions flagged by
putting the register index in a particular range. You can see in the
"else" clause that the mode is re-extracted from the index using an
integer division and the offset of 1 is extracted using a mod. This is
then translated into the actual register visible from that mode with
that index. From this point forward, the CPU can pretend the integer
register file is one big flat contiguous space and totally ignore the
ISAs register indexing semantics.
One additional mechanism is at work when actually storing the register
index in the StaticInst object. There are really three different types
of register indices, integer, float and misc (which could have also been
called "other" or "control"), but these are all stored in the same array
with no flag to distinguish them. To be able to tell them apart later,
an offset is added to them so that the integer indexes are all from 0 to
FP_Base_DepTag - 1, the floating point registers are all from
FP_Base_DepTag to Ctrl_Base_DepTag - 1 (inconsistently named for
historical reasons), and the misc registers are from Ctrl_Base_DepTag
and up. This is a fairly fragile system since if a, say, integer index
is large enough, it might spill into the fp or misc range and be
misidentified later. I'd like to replace this system with
multidimensional indices that track the type explicitly, but I won't get
into the specifics here.
A flaw in the combination of these two systems is why this particular
index isn't being handled correctly. The FP_Base_DepTag is being set to
NumIntRegs, but in reality because of the intRegInMode function, integer
indices can be a lot larger than that. When this particular index is
being processed, 1->577 is even bigger than Ctrl_Base_DepTag, so it gets
interpreted as a misc index. These aren't renamed and are passed
directly to the ISA object to interpret/use as array indexes. I think
the reason this works fine on the simple CPUs is that they always know
explicitly what type of register something is, so when they go to undo
the DepTag offset, they just pull out the right one automatically. O3
isn't able to do that. I have a patch attached that simply multiplies
the FP_Base_DepTag value by 32 since 31 is the largest MODE_* constant.
In a more final version I'd want to do something that used the real
upper limit instead of just knowing multiplying by 32 gives the right
answer.
I'm not completely convinced this will solve your segfault, though. It
looks like that is caused by a bad DynInst pointer ending up in one of
the TimeBuffer structures used to pass values between stages. When that
goes out of scope and attempts to reference count itself, the junk
pointer is dereferenced and causes a segfault. You can tell the pointer
is bad because in 64 bit x86, pointers have to be canonical, or in other
words be sign extended beyond the largest implemented virtual address
bit. Anything starting with an 8 as the MSB is pretty much guaranteed to
be bogus. It could still be, however, that writing beyond the end of a
register file trampled on that structure and corrupted the pointer.
There are asserts in the ISA object's functions that should prevent
that, but those would be disabled if you were running with m5.fast (not
recommended unless you're 100% certain everything is working).
Gabe
Min Kyu Jeong wrote:
> The following is the excerpt from the disassembly.
>
> 117c: e321f013 msr CPSR_c, #19 ; 0x13
> 1180: e24fd08c sub sp, pc, #140 ; 0x8c
> 1184: e321f011 msr CPSR_c, #17 ; 0x11
> 1188: e24fd094 sub sp, pc, #148 ; 0x94
> 118c: e321f012 msr CPSR_c, #18 ; 0x12
> 1190: e24fd09c sub sp, pc, #156 ; 0x9c
> 1194: e321f01b msr CPSR_c, #27 ; 0x1b
> 1198: e24fd0a4 sub sp, pc, #164 ; 0xa4
> 119c: e321f017 msr CPSR_c, #23 ; 0x17
> 11a0: e24fd0ac sub sp, pc, #172 ; 0xac
> 11a4: e321f01f msr CPSR_c, #31 ; 0x1f
> 11a8: e24fd0b4 sub sp, pc, #180 ; 0xb4
> 11ac: e321f013 msr CPSR_c, #19 ; 0x13
> 11b0: ea000002 b 11c0 <skipLabel_00000002>
>
> 000011b4 <LabStr_00000002>:
> 11b4: 5f444441 69736162 00315f63 ADD_basic_1.
>
>
> After x11b0, branch is predicated fall-through and the string label is
> fetched. The particular bytes that causes the mess is of address 11b8,
> so I think it is the second 4B chunk of the label: x69736162
>
> The following is the relevant part of trace from the run. There are
> some additional prints that I added.
>
> 17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b4 (0) created
> [sn:1585]
> 17680000: system.cpu.fetch: [tid:0]: Instruction is: svcpl
> 17680000: global: MicroLdrUop, regIdx : 577
> 17680000: global: MicroLdrUop, regIdx : 581
> 17680000: global: MicroLdrUop, regIdx : 582
> 17680000: global: MicroLdrUop, regIdx : 584
> 17680000: global: MicroLdrUop, regIdx : 589
> 17680000: global: MicroLdrUop, regIdx : 590
> 17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (0) created
> [sn:1586]
> 17680000: system.cpu.fetch: [tid:0]: Instruction is: addi_uopvs
> r34, r3, #0
> 17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (1) created
> [sn:1587]
> 17680000: system.cpu.fetch: [tid:0]: Instruction is: subi_uopvs
> r3, r3, #24
> 17680000: system.cpu.fetch: [tid:0]: Done fetching, reached fetch
> bandwidth for this cycle.
> 17680000: system.cpu.fetch: [tid:0]: Setting PC to 0x0011b8.
> ...
> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (2) created
> [sn:1588]
> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs
> XXX, [r34, #24]
> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (3) created
> [sn:1589]
> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs
> XXX, [r34, #20]
> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (4) created
> [sn:1590]
> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs
> XXX, [r34, #16]
> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (5) created
> [sn:1591]
> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs
> XXX, [r34, #12]
> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (6) created
> [sn:1592]
> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs
> XXX, [r34, #8]
> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (7) created
> [sn:1593]
> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs
> XXX, [r34, #4]
> ...
> 18440000: system.cpu.rename: [tid:0]: Processing instruction [sn:1588]
> with PC 0x11b8.
> 18440000: system.cpu.rename: Adjusting reg index from 105 to 105.
> 18440000: system.cpu.rename: [tid:0]: Looking up arch reg 105, got
> physical reg 512.
> 18440000: system.cpu.rename: [tid:0]: Register 512 is ready.
> 18440000: global: [sn:1588] has 1 ready out of 4 sources. RTI 0)
> 18440000: system.cpu.rename: Flattening index 35 to 35.
> 18440000: system.cpu.rename: [tid:0]: Looking up arch reg 35, got
> physical reg 147.
> 18440000: system.cpu.rename: [tid:0]: Register 147 is ready.
> 18440000: global: [sn:1588] has 2 ready out of 4 sources. RTI 0)
> 18440000: system.cpu.rename: Flattening index 34 to 34.
> 18440000: system.cpu.rename: [tid:0]: Looking up arch reg 34, got
> physical reg 72.
> 18440000: system.cpu.rename: [tid:0]: Register 72 is not ready.
> 18440000: system.cpu.rename: Adjusting reg index from 577 to 577.
> 18440000: system.cpu.rename: [tid:0]: Looking up arch reg 577, got
> physical reg 984.
> 18440000: system.cpu.rename: [tid:0]: Register 984 is ready.
> 18440000: global: [sn:1588] has 3 ready out of 4 sources. RTI 0)
> 18440000: system.cpu.rename: Adjusting reg index from 577 to 577.
> 18440000: global: Renamed misc reg 472
> *18440000: global: Renamed reg 472 to physical reg 984 old mapping was
> 984*
> *18440000: system.cpu.rename: [tid:0]: Renaming arch reg 577 to
> physical reg 984.*
> 18440000: system.cpu.rename: [tid:0]: Adding instruction to history
> buffer (size=3), [sn:1588].
>
>
>
> On Fri, Jul 9, 2010 at 4:23 PM, Gabriel Michael Black
> <[email protected] <mailto:[email protected]>> wrote:
>
> Thanks for the extra info which should be very helpful. Can you
> please tell us what the actual bytes are for the junk instruction?
>
>
> Gabe
>
> Quoting Min Kyu Jeong <[email protected] <mailto:[email protected]>>:
>
> I looked into this thing, but still don't fully understand how the
> out-of-bound reg index causes segfault. Instead, I will just
> describe what
> is happening hoping someone would catch a clue from it.
>
> The register index that goes out of bound is the architectural
> register
> index, stored in StaticInst class _destRegIdx[0]. The
> particular StaticInst
> I am getting this from is MicroLdrUop. In the constructor of
> the MacroMemOp
> (This invalid garbage instruction from mispredicted path is
> decoded as
> LdmStm), MicroLdrUop instances are generated. The destination
> register
> indices for uops are generated from the bit vector, and there
> is this bit of
> code
>
> if (force_user) {
> regIdx = instRegInMode(MODE_USER, regIdx);
> }
>
> that changes regIdx from 1 to 577. This is stored in
> _destRegIdx[0] variable
> of the MicroLdrUop StaticInst. During renaming, 577 is renamed
> to 984 = 577
> - numLogicalRegs + numPhysicalRegs
>
> This instRegInMode() is what I found suspicous, since the reg
> window is
> handled by ArmISA::flattenIntIndex() call.
>
> Anyways, the the simulation segfaults during advance()
> function call of the
> timebuffer at the end of the NEXT tick. The following is the
> call stack.
>
>
> #0 0x000000000040a2fa in RefCounted::decref
> (this=0x8d4810708d48c84d) at
> build/ARM_FS/base/refcnt.hh:51
> #1 0x0000000000761daa in
> RefCountingPtr<BaseO3DynInst<O3CPUImpl> >::del
> (this=0x9cd200) at build/ARM_FS/base/refcnt.hh:69
> #2 0x0000000000761dc1 in ~RefCountingPtr (this=0x9cd200) at
> build/ARM_FS/base/refcnt.hh:85
> #3 0x000000000077ebf9 in ~commitComm (this=0x9cd1a8) at
> build/ARM_FS/cpu/o3/comm.hh:153
> #4 0x000000000077ec49 in ~TimeBufStruct (this=0x9ccec8) at
> build/ARM_FS/cpu/o3/comm.hh:110
> #5 0x00000000007884c4 in TimeBuffer<TimeBufStruct<O3CPUImpl>
> >::advance
> (this=0x1991038) at build/ARM_FS/base/timebuf.hh:187
> #6 0x0000000000798c31 in FullO3CPU<O3CPUImpl>::tick
> (this=0x198d310) at
> build/ARM_FS/cpu/o3/cpu.cc:523
>
> I made this segfault goes away by overriding the idx 577 to 0.
>
> Thanks,
>
> On Wed, Jun 30, 2010 at 2:17 PM, Gabriel Michael Black <
> [email protected] <mailto:[email protected]>> wrote:
>
> Could you be more specific? There are a lot of register
> related indexes and
> I'm not sure exactly which ones you're talking about.
> Could you walk through
> what's happening from the illegal encoding, through the
> decoder, through the
> CPU and up to the segfault? I don't totally understand the
> mechanics of the
> failure at the moment, but my gut reaction is that the
> decoder should have
> returned an "Unknown(machInst)" when the index was out of
> bounds. I'm not
> convinced that's what's happening, though.
>
>
> Gabe
>
> Quoting Min Kyu Jeong <[email protected]
> <mailto:[email protected]>>:
>
> I just found a case somewhat related to this. Not exactly
> an assertion,
>
> but
> a segfault from the mispredicated path (non)instructions.
>
> When the operand register index is out of the range,
> the call to
> timeBuffer.advance() right after the renaming of such
> registers causes
> segfault . I bypassed this problem by making that
> out-of-bound register
> index to ZERO registers during the renaming (more
> particularly, during
> index
> flattening). I think raising a fault would be a better
> solution, but
> holding
> off from actually doing it. Any suggestion would be
> appreciated.
>
> ps. is the [m5-dev] tag in the title added by the
> mailing list, or should
> I
> add it myself?
>
> Thanks,
>
> Min
>
> On Mon, Jun 14, 2010 at 3:52 PM, Gabriel Michael Black <
> [email protected] <mailto:[email protected]>>
> wrote:
>
> It's important to distinguish between M5 making
> sense, and the code it's
>
> executing making sense. We shouldn't (and I hope
> don't) have any asserts
> that check conditions controllable from the
> simulated code since those
> should generally just cause a fault and may, as
> you point out, be
> mispeculated. It's fine to check that M5 is
> internally consistent,
> though.
> This is supposed to work in all the CPU models and
> as far as I know
> generally does. M5's CPU models should, to the
> first order, correctly do
> whatever whacky, nonsensical things the
> instruction memory tells it to do
> without complaining. If you've found a case where
> it doesn't (which has
> happened before) please let us know so we can fix it.
>
> Gabe
>
>
> Quoting Min Kyu Jeong <[email protected]
> <mailto:[email protected]>>:
>
> Is it possible that the speculatively fetched
> instructions can cause
>
> programming assertions to fail? Until a branch
> is resolved, whatever
> (even
> non-instructions) in the predicted path could
> be fetched and decoded.
> Can't
> assertions on instruction sanity fail for those?
>
> I am trying to make O3 CPU model for ARM
> working. In many cases the
> first
> instruction is a branch followed by a
> interrupt vector table. I was
> wondering if such cases exist for other CPU
> models and if it is, handled
> how.
>
> Thanks,
>
> Min
>
>
>
> _______________________________________________
> m5-dev mailing list
> [email protected] <mailto:[email protected]>
> http://m5sim.org/mailman/listinfo/m5-dev
>
>
>
>
> _______________________________________________
> m5-dev mailing list
> [email protected] <mailto:[email protected]>
> http://m5sim.org/mailman/listinfo/m5-dev
>
>
>
>
> _______________________________________________
> m5-dev mailing list
> [email protected] <mailto:[email protected]>
> http://m5sim.org/mailman/listinfo/m5-dev
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>
ARM: Adjust the FP_Base_DepTag to be larger than the largest int reg index.
diff -r aab017d1adc6 -r 44fdd0b065bd src/arch/arm/registers.hh
--- a/src/arch/arm/registers.hh Thu Jul 08 17:56:13 2010 -0700
+++ b/src/arch/arm/registers.hh Mon Jul 12 00:46:53 2010 -0700
@@ -83,7 +83,7 @@
const int SyscallSuccessReg = ReturnValueReg;
// These help enumerate all the registers for dependence tracking.
-const int FP_Base_DepTag = NumIntRegs;
+const int FP_Base_DepTag = NumIntRegs * 32;
const int Ctrl_Base_DepTag = FP_Base_DepTag + NumFloatRegs;
typedef union {
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev