I will try it tomorrow and let you know.

On Mon, Jul 12, 2010 at 6:51 PM, Gabriel Michael Black <
[email protected]> wrote:

> Did that patch fix it?
>
> Gabe
>
>
> Quoting Gabe Black <[email protected]>:
>
>  Here's more or less what's going on as far as the register index. The
>> load microop needs to store into register 1, and it needs to be sure it
>> stores into the version visible from the "user" mode. It does that by
>> applying the intRegInMode function which shifts the register index 1 by
>> MODE_USER * the number of integer registers. Later, the flattenIntIndex
>> function is called to unambiguously figure out what a particular
>> register index goes with given the current values of various ISA state
>> (specifically the CPU mode for ARM) or other conditions flagged by
>> putting the register index in a particular range. You can see in the
>> "else" clause that the mode is re-extracted from the index using an
>> integer division and the offset of 1 is extracted using a mod. This is
>> then translated into the actual register visible from that mode with
>> that index. From this point forward, the CPU can pretend the integer
>> register file is one big flat contiguous space and totally ignore the
>> ISAs register indexing semantics.
>>
>> One additional mechanism is at work when actually storing the register
>> index in the StaticInst object. There are really three different types
>> of register indices, integer, float and misc (which could have also been
>> called "other" or "control"), but these are all stored in the same array
>> with no flag to distinguish them. To be able to tell them apart later,
>> an offset is added to them so that the integer indexes are all from 0 to
>> FP_Base_DepTag - 1, the floating point registers are all from
>> FP_Base_DepTag to Ctrl_Base_DepTag - 1 (inconsistently named for
>> historical reasons), and the misc registers are from Ctrl_Base_DepTag
>> and up. This is a fairly fragile system since if a, say, integer index
>> is large enough, it might spill into the fp or misc range and be
>> misidentified later. I'd like to replace this system with
>> multidimensional indices that track the type explicitly, but I won't get
>> into the specifics here.
>>
>> A flaw in the combination of these two systems is why this particular
>> index isn't being handled correctly. The FP_Base_DepTag is being set to
>> NumIntRegs, but in reality because of the intRegInMode function, integer
>> indices can be a lot larger than that. When this particular index is
>> being processed, 1->577 is even bigger than Ctrl_Base_DepTag, so it gets
>> interpreted as a misc index. These aren't renamed and are passed
>> directly to the ISA object to interpret/use as array indexes. I think
>> the reason this works fine on the simple CPUs is that they always know
>> explicitly what type of register something is, so when they go to undo
>> the DepTag offset, they just pull out the right one automatically. O3
>> isn't able to do that. I have a patch attached that simply multiplies
>> the FP_Base_DepTag value by 32 since 31 is the largest MODE_* constant.
>> In a more final version I'd want to do something that used the real
>> upper limit instead of just knowing multiplying by 32 gives the right
>> answer.
>>
>> I'm not completely convinced this will solve your segfault, though. It
>> looks like that is caused by a bad DynInst pointer ending up in one of
>> the TimeBuffer structures used to pass values between stages. When that
>> goes out of scope and attempts to reference count itself, the junk
>> pointer is dereferenced and causes a segfault. You can tell the pointer
>> is bad because in 64 bit x86, pointers have to be canonical, or in other
>> words be sign extended beyond the largest implemented virtual address
>> bit. Anything starting with an 8 as the MSB is pretty much guaranteed to
>> be bogus. It could still be, however, that writing beyond the end of a
>> register file trampled on that structure and corrupted the pointer.
>> There are asserts in the ISA object's functions that should prevent
>> that, but those would be disabled if you were running with m5.fast (not
>> recommended unless you're 100% certain everything is working).
>>
>> Gabe
>>
>> Min Kyu Jeong wrote:
>>
>>> The following is the excerpt from the disassembly.
>>>
>>>        117c: e321f013 msr CPSR_c, #19 ; 0x13
>>>        1180: e24fd08c sub sp, pc, #140 ; 0x8c
>>>        1184: e321f011 msr CPSR_c, #17 ; 0x11
>>>        1188: e24fd094 sub sp, pc, #148 ; 0x94
>>>        118c: e321f012 msr CPSR_c, #18 ; 0x12
>>>        1190: e24fd09c sub sp, pc, #156 ; 0x9c
>>>        1194: e321f01b msr CPSR_c, #27 ; 0x1b
>>>        1198: e24fd0a4 sub sp, pc, #164 ; 0xa4
>>>        119c: e321f017 msr CPSR_c, #23 ; 0x17
>>>        11a0: e24fd0ac sub sp, pc, #172 ; 0xac
>>>        11a4: e321f01f msr CPSR_c, #31 ; 0x1f
>>>        11a8: e24fd0b4 sub sp, pc, #180 ; 0xb4
>>>        11ac: e321f013 msr CPSR_c, #19 ; 0x13
>>>        11b0: ea000002 b 11c0 <skipLabel_00000002>
>>>
>>>    000011b4 <LabStr_00000002>:
>>>        11b4: 5f444441 69736162 00315f63              ADD_basic_1.
>>>
>>>
>>> After x11b0, branch is predicated fall-through and the string label is
>>> fetched. The particular bytes that causes the mess is of address 11b8,
>>> so I think it is the second 4B chunk of the label: x69736162
>>>
>>> The following is the relevant part of trace from the run. There are
>>> some additional prints that I added.
>>>
>>> 17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b4 (0) created
>>> [sn:1585]
>>> 17680000: system.cpu.fetch: [tid:0]: Instruction is:   svcpl
>>> 17680000: global: MicroLdrUop, regIdx : 577
>>> 17680000: global: MicroLdrUop, regIdx : 581
>>> 17680000: global: MicroLdrUop, regIdx : 582
>>> 17680000: global: MicroLdrUop, regIdx : 584
>>> 17680000: global: MicroLdrUop, regIdx : 589
>>> 17680000: global: MicroLdrUop, regIdx : 590
>>> 17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (0) created
>>> [sn:1586]
>>> 17680000: system.cpu.fetch: [tid:0]: Instruction is:   addi_uopvs
>>> r34, r3, #0
>>> 17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (1) created
>>> [sn:1587]
>>> 17680000: system.cpu.fetch: [tid:0]: Instruction is:   subi_uopvs
>>> r3, r3, #24
>>> 17680000: system.cpu.fetch: [tid:0]: Done fetching, reached fetch
>>> bandwidth for this cycle.
>>> 17680000: system.cpu.fetch: [tid:0]: Setting PC to 0x0011b8.
>>> ...
>>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (2) created
>>> [sn:1588]
>>> 17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
>>> XXX, [r34, #24]
>>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (3) created
>>> [sn:1589]
>>> 17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
>>> XXX, [r34, #20]
>>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (4) created
>>> [sn:1590]
>>> 17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
>>> XXX, [r34, #16]
>>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (5) created
>>> [sn:1591]
>>> 17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
>>> XXX, [r34, #12]
>>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (6) created
>>> [sn:1592]
>>> 17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
>>> XXX, [r34, #8]
>>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (7) created
>>> [sn:1593]
>>> 17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
>>> XXX, [r34, #4]
>>> ...
>>> 18440000: system.cpu.rename: [tid:0]: Processing instruction [sn:1588]
>>> with PC 0x11b8.
>>> 18440000: system.cpu.rename: Adjusting reg index from 105 to 105.
>>> 18440000: system.cpu.rename: [tid:0]: Looking up arch reg 105, got
>>> physical reg 512.
>>> 18440000: system.cpu.rename: [tid:0]: Register 512 is ready.
>>> 18440000: global: [sn:1588] has 1 ready out of 4 sources. RTI 0)
>>> 18440000: system.cpu.rename: Flattening index 35 to 35.
>>> 18440000: system.cpu.rename: [tid:0]: Looking up arch reg 35, got
>>> physical reg 147.
>>> 18440000: system.cpu.rename: [tid:0]: Register 147 is ready.
>>> 18440000: global: [sn:1588] has 2 ready out of 4 sources. RTI 0)
>>> 18440000: system.cpu.rename: Flattening index 34 to 34.
>>> 18440000: system.cpu.rename: [tid:0]: Looking up arch reg 34, got
>>> physical reg 72.
>>> 18440000: system.cpu.rename: [tid:0]: Register 72 is not ready.
>>> 18440000: system.cpu.rename: Adjusting reg index from 577 to 577.
>>> 18440000: system.cpu.rename: [tid:0]: Looking up arch reg 577, got
>>> physical reg 984.
>>> 18440000: system.cpu.rename: [tid:0]: Register 984 is ready.
>>> 18440000: global: [sn:1588] has 3 ready out of 4 sources. RTI 0)
>>> 18440000: system.cpu.rename: Adjusting reg index from 577 to 577.
>>> 18440000: global: Renamed misc reg 472
>>> *18440000: global: Renamed reg 472 to physical reg 984 old mapping was
>>> 984*
>>> *18440000: system.cpu.rename: [tid:0]: Renaming arch reg 577 to
>>> physical reg 984.*
>>> 18440000: system.cpu.rename: [tid:0]: Adding instruction to history
>>> buffer (size=3), [sn:1588].
>>>
>>>
>>>
>>> On Fri, Jul 9, 2010 at 4:23 PM, Gabriel Michael Black
>>> <[email protected] <mailto:[email protected]>> wrote:
>>>
>>>    Thanks for the extra info which should be very helpful. Can you
>>>    please tell us what the actual bytes are for the junk instruction?
>>>
>>>
>>>    Gabe
>>>
>>>    Quoting Min Kyu Jeong <[email protected] <mailto:[email protected]>>:
>>>
>>>        I looked into this thing, but still don't fully understand how the
>>>        out-of-bound reg index causes segfault. Instead, I will just
>>>        describe what
>>>        is happening hoping someone would catch a clue from it.
>>>
>>>        The register index that goes out of bound is the architectural
>>>        register
>>>        index, stored in StaticInst class _destRegIdx[0]. The
>>>        particular StaticInst
>>>        I am getting this from is MicroLdrUop. In the constructor of
>>>        the MacroMemOp
>>>        (This invalid garbage instruction from mispredicted path is
>>>        decoded as
>>>        LdmStm), MicroLdrUop instances are generated. The destination
>>>        register
>>>        indices for uops are generated from the bit vector, and there
>>>        is this bit of
>>>        code
>>>
>>>        if (force_user) {
>>>          regIdx = instRegInMode(MODE_USER, regIdx);
>>>        }
>>>
>>>        that changes regIdx from 1 to 577. This is stored in
>>>        _destRegIdx[0] variable
>>>        of the MicroLdrUop StaticInst. During renaming, 577 is renamed
>>>        to 984 = 577
>>>        - numLogicalRegs + numPhysicalRegs
>>>
>>>        This instRegInMode() is what I found suspicous, since the reg
>>>        window is
>>>        handled by ArmISA::flattenIntIndex() call.
>>>
>>>        Anyways, the the simulation segfaults during advance()
>>>        function call of the
>>>        timebuffer at the end of the NEXT tick. The following is the
>>>        call stack.
>>>
>>>
>>>        #0  0x000000000040a2fa in RefCounted::decref
>>>        (this=0x8d4810708d48c84d) at
>>>        build/ARM_FS/base/refcnt.hh:51
>>>        #1  0x0000000000761daa in
>>>        RefCountingPtr<BaseO3DynInst<O3CPUImpl> >::del
>>>        (this=0x9cd200) at build/ARM_FS/base/refcnt.hh:69
>>>        #2  0x0000000000761dc1 in ~RefCountingPtr (this=0x9cd200) at
>>>        build/ARM_FS/base/refcnt.hh:85
>>>        #3  0x000000000077ebf9 in ~commitComm (this=0x9cd1a8) at
>>>        build/ARM_FS/cpu/o3/comm.hh:153
>>>        #4  0x000000000077ec49 in ~TimeBufStruct (this=0x9ccec8) at
>>>        build/ARM_FS/cpu/o3/comm.hh:110
>>>        #5  0x00000000007884c4 in TimeBuffer<TimeBufStruct<O3CPUImpl>
>>>        >::advance
>>>        (this=0x1991038) at build/ARM_FS/base/timebuf.hh:187
>>>        #6  0x0000000000798c31 in FullO3CPU<O3CPUImpl>::tick
>>>        (this=0x198d310) at
>>>        build/ARM_FS/cpu/o3/cpu.cc:523
>>>
>>>        I made this segfault goes away by overriding the idx 577 to 0.
>>>
>>>        Thanks,
>>>
>>>        On Wed, Jun 30, 2010 at 2:17 PM, Gabriel Michael Black <
>>>        [email protected] <mailto:[email protected]>> wrote:
>>>
>>>            Could you be more specific? There are a lot of register
>>>            related indexes and
>>>            I'm not sure exactly which ones you're talking about.
>>>            Could you walk through
>>>            what's happening from the illegal encoding, through the
>>>            decoder, through the
>>>            CPU and up to the segfault? I don't totally understand the
>>>            mechanics of the
>>>            failure at the moment, but my gut reaction is that the
>>>            decoder should have
>>>            returned an "Unknown(machInst)" when the index was out of
>>>            bounds. I'm not
>>>            convinced that's what's happening, though.
>>>
>>>
>>>            Gabe
>>>
>>>            Quoting Min Kyu Jeong <[email protected]
>>>            <mailto:[email protected]>>:
>>>
>>>             I just found a case somewhat related to this. Not exactly
>>>            an assertion,
>>>
>>>                but
>>>                a segfault from the mispredicated path (non)instructions.
>>>
>>>                When the operand register index is out of the range,
>>>                the call to
>>>                timeBuffer.advance() right after the renaming of such
>>>                registers causes
>>>                segfault . I bypassed this problem by making that
>>>                out-of-bound register
>>>                index to ZERO registers during the renaming (more
>>>                particularly, during
>>>                index
>>>                flattening). I think raising a fault would be a better
>>>                solution, but
>>>                holding
>>>                off from actually doing it. Any suggestion would be
>>>                appreciated.
>>>
>>>                ps. is the [m5-dev] tag in the title added by the
>>>                mailing list, or should
>>>                I
>>>                add it myself?
>>>
>>>                Thanks,
>>>
>>>                Min
>>>
>>>                On Mon, Jun 14, 2010 at 3:52 PM, Gabriel Michael Black <
>>>                [email protected] <mailto:[email protected]>>
>>>                wrote:
>>>
>>>                 It's important to distinguish between M5 making
>>>                sense, and the code it's
>>>
>>>                    executing making sense. We shouldn't (and I hope
>>>                    don't) have any asserts
>>>                    that check conditions controllable from the
>>>                    simulated code since those
>>>                    should generally just cause a fault and may, as
>>>                    you point out, be
>>>                    mispeculated. It's fine to check that M5 is
>>>                    internally consistent,
>>>                    though.
>>>                    This is supposed to work in all the CPU models and
>>>                    as far as I know
>>>                    generally does. M5's CPU models should, to the
>>>                    first order, correctly do
>>>                    whatever whacky, nonsensical things the
>>>                    instruction memory tells it to do
>>>                    without complaining. If you've found a case where
>>>                    it doesn't (which has
>>>                    happened before) please let us know so we can fix it.
>>>
>>>                    Gabe
>>>
>>>
>>>                    Quoting Min Kyu Jeong <[email protected]
>>>                    <mailto:[email protected]>>:
>>>
>>>                     Is it possible that the speculatively fetched
>>>                    instructions can cause
>>>
>>>                        programming assertions to fail? Until a branch
>>>                        is resolved, whatever
>>>                        (even
>>>                        non-instructions) in the predicted path could
>>>                        be fetched and decoded.
>>>                        Can't
>>>                        assertions on instruction sanity fail for those?
>>>
>>>                        I am trying to make O3 CPU model for ARM
>>>                        working. In many cases the
>>>                        first
>>>                        instruction is a branch followed by a
>>>                        interrupt vector table. I was
>>>                        wondering if such cases exist for other CPU
>>>                        models and if it is, handled
>>>                        how.
>>>
>>>                        Thanks,
>>>
>>>                        Min
>>>
>>>
>>>
>>>                    _______________________________________________
>>>                    m5-dev mailing list
>>>                    [email protected] <mailto:[email protected]>
>>>                    http://m5sim.org/mailman/listinfo/m5-dev
>>>
>>>
>>>
>>>
>>>            _______________________________________________
>>>            m5-dev mailing list
>>>            [email protected] <mailto:[email protected]>
>>>            http://m5sim.org/mailman/listinfo/m5-dev
>>>
>>>
>>>
>>>
>>>    _______________________________________________
>>>    m5-dev mailing list
>>>    [email protected] <mailto:[email protected]>
>>>    http://m5sim.org/mailman/listinfo/m5-dev
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> m5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>
>>>
>>
>>
>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to