I will try it tomorrow and let you know.
On Mon, Jul 12, 2010 at 6:51 PM, Gabriel Michael Black < [email protected]> wrote: > Did that patch fix it? > > Gabe > > > Quoting Gabe Black <[email protected]>: > > Here's more or less what's going on as far as the register index. The >> load microop needs to store into register 1, and it needs to be sure it >> stores into the version visible from the "user" mode. It does that by >> applying the intRegInMode function which shifts the register index 1 by >> MODE_USER * the number of integer registers. Later, the flattenIntIndex >> function is called to unambiguously figure out what a particular >> register index goes with given the current values of various ISA state >> (specifically the CPU mode for ARM) or other conditions flagged by >> putting the register index in a particular range. You can see in the >> "else" clause that the mode is re-extracted from the index using an >> integer division and the offset of 1 is extracted using a mod. This is >> then translated into the actual register visible from that mode with >> that index. From this point forward, the CPU can pretend the integer >> register file is one big flat contiguous space and totally ignore the >> ISAs register indexing semantics. >> >> One additional mechanism is at work when actually storing the register >> index in the StaticInst object. There are really three different types >> of register indices, integer, float and misc (which could have also been >> called "other" or "control"), but these are all stored in the same array >> with no flag to distinguish them. To be able to tell them apart later, >> an offset is added to them so that the integer indexes are all from 0 to >> FP_Base_DepTag - 1, the floating point registers are all from >> FP_Base_DepTag to Ctrl_Base_DepTag - 1 (inconsistently named for >> historical reasons), and the misc registers are from Ctrl_Base_DepTag >> and up. This is a fairly fragile system since if a, say, integer index >> is large enough, it might spill into the fp or misc range and be >> misidentified later. I'd like to replace this system with >> multidimensional indices that track the type explicitly, but I won't get >> into the specifics here. >> >> A flaw in the combination of these two systems is why this particular >> index isn't being handled correctly. The FP_Base_DepTag is being set to >> NumIntRegs, but in reality because of the intRegInMode function, integer >> indices can be a lot larger than that. When this particular index is >> being processed, 1->577 is even bigger than Ctrl_Base_DepTag, so it gets >> interpreted as a misc index. These aren't renamed and are passed >> directly to the ISA object to interpret/use as array indexes. I think >> the reason this works fine on the simple CPUs is that they always know >> explicitly what type of register something is, so when they go to undo >> the DepTag offset, they just pull out the right one automatically. O3 >> isn't able to do that. I have a patch attached that simply multiplies >> the FP_Base_DepTag value by 32 since 31 is the largest MODE_* constant. >> In a more final version I'd want to do something that used the real >> upper limit instead of just knowing multiplying by 32 gives the right >> answer. >> >> I'm not completely convinced this will solve your segfault, though. It >> looks like that is caused by a bad DynInst pointer ending up in one of >> the TimeBuffer structures used to pass values between stages. When that >> goes out of scope and attempts to reference count itself, the junk >> pointer is dereferenced and causes a segfault. You can tell the pointer >> is bad because in 64 bit x86, pointers have to be canonical, or in other >> words be sign extended beyond the largest implemented virtual address >> bit. Anything starting with an 8 as the MSB is pretty much guaranteed to >> be bogus. It could still be, however, that writing beyond the end of a >> register file trampled on that structure and corrupted the pointer. >> There are asserts in the ISA object's functions that should prevent >> that, but those would be disabled if you were running with m5.fast (not >> recommended unless you're 100% certain everything is working). >> >> Gabe >> >> Min Kyu Jeong wrote: >> >>> The following is the excerpt from the disassembly. >>> >>> 117c: e321f013 msr CPSR_c, #19 ; 0x13 >>> 1180: e24fd08c sub sp, pc, #140 ; 0x8c >>> 1184: e321f011 msr CPSR_c, #17 ; 0x11 >>> 1188: e24fd094 sub sp, pc, #148 ; 0x94 >>> 118c: e321f012 msr CPSR_c, #18 ; 0x12 >>> 1190: e24fd09c sub sp, pc, #156 ; 0x9c >>> 1194: e321f01b msr CPSR_c, #27 ; 0x1b >>> 1198: e24fd0a4 sub sp, pc, #164 ; 0xa4 >>> 119c: e321f017 msr CPSR_c, #23 ; 0x17 >>> 11a0: e24fd0ac sub sp, pc, #172 ; 0xac >>> 11a4: e321f01f msr CPSR_c, #31 ; 0x1f >>> 11a8: e24fd0b4 sub sp, pc, #180 ; 0xb4 >>> 11ac: e321f013 msr CPSR_c, #19 ; 0x13 >>> 11b0: ea000002 b 11c0 <skipLabel_00000002> >>> >>> 000011b4 <LabStr_00000002>: >>> 11b4: 5f444441 69736162 00315f63 ADD_basic_1. >>> >>> >>> After x11b0, branch is predicated fall-through and the string label is >>> fetched. The particular bytes that causes the mess is of address 11b8, >>> so I think it is the second 4B chunk of the label: x69736162 >>> >>> The following is the relevant part of trace from the run. There are >>> some additional prints that I added. >>> >>> 17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b4 (0) created >>> [sn:1585] >>> 17680000: system.cpu.fetch: [tid:0]: Instruction is: svcpl >>> 17680000: global: MicroLdrUop, regIdx : 577 >>> 17680000: global: MicroLdrUop, regIdx : 581 >>> 17680000: global: MicroLdrUop, regIdx : 582 >>> 17680000: global: MicroLdrUop, regIdx : 584 >>> 17680000: global: MicroLdrUop, regIdx : 589 >>> 17680000: global: MicroLdrUop, regIdx : 590 >>> 17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (0) created >>> [sn:1586] >>> 17680000: system.cpu.fetch: [tid:0]: Instruction is: addi_uopvs >>> r34, r3, #0 >>> 17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (1) created >>> [sn:1587] >>> 17680000: system.cpu.fetch: [tid:0]: Instruction is: subi_uopvs >>> r3, r3, #24 >>> 17680000: system.cpu.fetch: [tid:0]: Done fetching, reached fetch >>> bandwidth for this cycle. >>> 17680000: system.cpu.fetch: [tid:0]: Setting PC to 0x0011b8. >>> ... >>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (2) created >>> [sn:1588] >>> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs >>> XXX, [r34, #24] >>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (3) created >>> [sn:1589] >>> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs >>> XXX, [r34, #20] >>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (4) created >>> [sn:1590] >>> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs >>> XXX, [r34, #16] >>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (5) created >>> [sn:1591] >>> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs >>> XXX, [r34, #12] >>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (6) created >>> [sn:1592] >>> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs >>> XXX, [r34, #8] >>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (7) created >>> [sn:1593] >>> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs >>> XXX, [r34, #4] >>> ... >>> 18440000: system.cpu.rename: [tid:0]: Processing instruction [sn:1588] >>> with PC 0x11b8. >>> 18440000: system.cpu.rename: Adjusting reg index from 105 to 105. >>> 18440000: system.cpu.rename: [tid:0]: Looking up arch reg 105, got >>> physical reg 512. >>> 18440000: system.cpu.rename: [tid:0]: Register 512 is ready. >>> 18440000: global: [sn:1588] has 1 ready out of 4 sources. RTI 0) >>> 18440000: system.cpu.rename: Flattening index 35 to 35. >>> 18440000: system.cpu.rename: [tid:0]: Looking up arch reg 35, got >>> physical reg 147. >>> 18440000: system.cpu.rename: [tid:0]: Register 147 is ready. >>> 18440000: global: [sn:1588] has 2 ready out of 4 sources. RTI 0) >>> 18440000: system.cpu.rename: Flattening index 34 to 34. >>> 18440000: system.cpu.rename: [tid:0]: Looking up arch reg 34, got >>> physical reg 72. >>> 18440000: system.cpu.rename: [tid:0]: Register 72 is not ready. >>> 18440000: system.cpu.rename: Adjusting reg index from 577 to 577. >>> 18440000: system.cpu.rename: [tid:0]: Looking up arch reg 577, got >>> physical reg 984. >>> 18440000: system.cpu.rename: [tid:0]: Register 984 is ready. >>> 18440000: global: [sn:1588] has 3 ready out of 4 sources. RTI 0) >>> 18440000: system.cpu.rename: Adjusting reg index from 577 to 577. >>> 18440000: global: Renamed misc reg 472 >>> *18440000: global: Renamed reg 472 to physical reg 984 old mapping was >>> 984* >>> *18440000: system.cpu.rename: [tid:0]: Renaming arch reg 577 to >>> physical reg 984.* >>> 18440000: system.cpu.rename: [tid:0]: Adding instruction to history >>> buffer (size=3), [sn:1588]. >>> >>> >>> >>> On Fri, Jul 9, 2010 at 4:23 PM, Gabriel Michael Black >>> <[email protected] <mailto:[email protected]>> wrote: >>> >>> Thanks for the extra info which should be very helpful. Can you >>> please tell us what the actual bytes are for the junk instruction? >>> >>> >>> Gabe >>> >>> Quoting Min Kyu Jeong <[email protected] <mailto:[email protected]>>: >>> >>> I looked into this thing, but still don't fully understand how the >>> out-of-bound reg index causes segfault. Instead, I will just >>> describe what >>> is happening hoping someone would catch a clue from it. >>> >>> The register index that goes out of bound is the architectural >>> register >>> index, stored in StaticInst class _destRegIdx[0]. The >>> particular StaticInst >>> I am getting this from is MicroLdrUop. In the constructor of >>> the MacroMemOp >>> (This invalid garbage instruction from mispredicted path is >>> decoded as >>> LdmStm), MicroLdrUop instances are generated. The destination >>> register >>> indices for uops are generated from the bit vector, and there >>> is this bit of >>> code >>> >>> if (force_user) { >>> regIdx = instRegInMode(MODE_USER, regIdx); >>> } >>> >>> that changes regIdx from 1 to 577. This is stored in >>> _destRegIdx[0] variable >>> of the MicroLdrUop StaticInst. During renaming, 577 is renamed >>> to 984 = 577 >>> - numLogicalRegs + numPhysicalRegs >>> >>> This instRegInMode() is what I found suspicous, since the reg >>> window is >>> handled by ArmISA::flattenIntIndex() call. >>> >>> Anyways, the the simulation segfaults during advance() >>> function call of the >>> timebuffer at the end of the NEXT tick. The following is the >>> call stack. >>> >>> >>> #0 0x000000000040a2fa in RefCounted::decref >>> (this=0x8d4810708d48c84d) at >>> build/ARM_FS/base/refcnt.hh:51 >>> #1 0x0000000000761daa in >>> RefCountingPtr<BaseO3DynInst<O3CPUImpl> >::del >>> (this=0x9cd200) at build/ARM_FS/base/refcnt.hh:69 >>> #2 0x0000000000761dc1 in ~RefCountingPtr (this=0x9cd200) at >>> build/ARM_FS/base/refcnt.hh:85 >>> #3 0x000000000077ebf9 in ~commitComm (this=0x9cd1a8) at >>> build/ARM_FS/cpu/o3/comm.hh:153 >>> #4 0x000000000077ec49 in ~TimeBufStruct (this=0x9ccec8) at >>> build/ARM_FS/cpu/o3/comm.hh:110 >>> #5 0x00000000007884c4 in TimeBuffer<TimeBufStruct<O3CPUImpl> >>> >::advance >>> (this=0x1991038) at build/ARM_FS/base/timebuf.hh:187 >>> #6 0x0000000000798c31 in FullO3CPU<O3CPUImpl>::tick >>> (this=0x198d310) at >>> build/ARM_FS/cpu/o3/cpu.cc:523 >>> >>> I made this segfault goes away by overriding the idx 577 to 0. >>> >>> Thanks, >>> >>> On Wed, Jun 30, 2010 at 2:17 PM, Gabriel Michael Black < >>> [email protected] <mailto:[email protected]>> wrote: >>> >>> Could you be more specific? There are a lot of register >>> related indexes and >>> I'm not sure exactly which ones you're talking about. >>> Could you walk through >>> what's happening from the illegal encoding, through the >>> decoder, through the >>> CPU and up to the segfault? I don't totally understand the >>> mechanics of the >>> failure at the moment, but my gut reaction is that the >>> decoder should have >>> returned an "Unknown(machInst)" when the index was out of >>> bounds. I'm not >>> convinced that's what's happening, though. >>> >>> >>> Gabe >>> >>> Quoting Min Kyu Jeong <[email protected] >>> <mailto:[email protected]>>: >>> >>> I just found a case somewhat related to this. Not exactly >>> an assertion, >>> >>> but >>> a segfault from the mispredicated path (non)instructions. >>> >>> When the operand register index is out of the range, >>> the call to >>> timeBuffer.advance() right after the renaming of such >>> registers causes >>> segfault . I bypassed this problem by making that >>> out-of-bound register >>> index to ZERO registers during the renaming (more >>> particularly, during >>> index >>> flattening). I think raising a fault would be a better >>> solution, but >>> holding >>> off from actually doing it. Any suggestion would be >>> appreciated. >>> >>> ps. is the [m5-dev] tag in the title added by the >>> mailing list, or should >>> I >>> add it myself? >>> >>> Thanks, >>> >>> Min >>> >>> On Mon, Jun 14, 2010 at 3:52 PM, Gabriel Michael Black < >>> [email protected] <mailto:[email protected]>> >>> wrote: >>> >>> It's important to distinguish between M5 making >>> sense, and the code it's >>> >>> executing making sense. We shouldn't (and I hope >>> don't) have any asserts >>> that check conditions controllable from the >>> simulated code since those >>> should generally just cause a fault and may, as >>> you point out, be >>> mispeculated. It's fine to check that M5 is >>> internally consistent, >>> though. >>> This is supposed to work in all the CPU models and >>> as far as I know >>> generally does. M5's CPU models should, to the >>> first order, correctly do >>> whatever whacky, nonsensical things the >>> instruction memory tells it to do >>> without complaining. If you've found a case where >>> it doesn't (which has >>> happened before) please let us know so we can fix it. >>> >>> Gabe >>> >>> >>> Quoting Min Kyu Jeong <[email protected] >>> <mailto:[email protected]>>: >>> >>> Is it possible that the speculatively fetched >>> instructions can cause >>> >>> programming assertions to fail? Until a branch >>> is resolved, whatever >>> (even >>> non-instructions) in the predicted path could >>> be fetched and decoded. >>> Can't >>> assertions on instruction sanity fail for those? >>> >>> I am trying to make O3 CPU model for ARM >>> working. In many cases the >>> first >>> instruction is a branch followed by a >>> interrupt vector table. I was >>> wondering if such cases exist for other CPU >>> models and if it is, handled >>> how. >>> >>> Thanks, >>> >>> Min >>> >>> >>> >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] <mailto:[email protected]> >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >>> >>> >>> >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] <mailto:[email protected]> >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >>> >>> >>> >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] <mailto:[email protected]> >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >>> >> >> > > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev >
_______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
