Sorry, I created the checkpoint I referred to with an O3 CPU with caches. From what I recall reading, caches don't get restored from checkpoints. Since the checkpoint wasn't during the benchmark run, I assumed that was okay.
-Andrew On Wed, May 2, 2012 at 9:07 PM, Ali Saidi <sa...@umich.edu> wrote: > ** > > You haven't answered the question about if you created the checkpoints > with an atomic cpu without caches. > > Ali > > > > > > On 02.05.2012 19:58, Andrew Cebulski wrote: > > I have not run with the checker CPU recently. Here's the stderr output > from a run I did awhile back: > http://dl.dropbox.com/u/2953302/gem5/err.0 > Note that the instruction match error is before my benchmark actually > starts running. The start of my boot script checks to see if my files > image is mounted (which it is), then continues on to run the benchmark. I > booted the system, mounted my files image, then took a checkpoint. I've > been running all my tests from that checkpoint. I found where my benchmark > started based on the ASID (from ExecAsid debug flag). > I delayed the start of gathering trace data until the second-to-last > linear increase in dynamic instructions in-flight. I'm running a new trace > now. > -Andrew > > > On Wed, May 2, 2012 at 5:28 PM, Ali Saidi <sa...@umich.edu> wrote: > >> Something is wrong well before this point. There is no reason that >> address 0x0 or 0x4 should be translated. >> >> Did you happen to create a checkpoint when caches were in the system? >> >> Have you tried to run with the checker cpu and see if it detects any >> errors? >> >> >> >> Ali >> >> >> >> >> >> On 02.05.2012 17:22, Andrew Cebulski wrote: >> >> They are data TLB misses that occur as the in-flight instruction count >> rises (at 0x0 and 0x4). The last TLB miss before the in-flight instruction >> count finally linearly decreases is to 0x200. Also, at the start of the >> rising slope, I see a miss to 0x8 and 0x2508c. >> Here's a trace file: >> http://dl.dropbox.com/u/2953302/gem5/tlb.out >> To reduce size, I just have lines that have either TLB or walker in them. >> I do see only a handful of instruction TLB misses. >> -Andrew >> >> On Wed, May 2, 2012 at 11:10 AM, Ali Saidi <sa...@umich.edu> wrote: >> >>> Hi Andrew, >>> >>> >>> >>> Thanks for digging into this. I think there is an issue somewhere, but >>> I'm still not sure where. >>> >>> Ali >>> >>> On 01.05.2012 23:34, Andrew Cebulski wrote: >>> >>> Okay, I'm positive now that the issue lies with delayed translations >>> that are squashed before finishing. >>> >>> On the data on instruction side? You seem to allude to data in the >>> paragraph below, but then instructions in the latter text. >>> >>> It seems to me like speculative load/stores are being executed, rather >>> than waiting for the instructions to commit. Once the instructions begin >>> getting (speculatively) executed in the TLB, a reference is left there, >>> which seems hard to root out and dereference after the instruction ends up >>> being squashed. At least, I have not been able to find that out in the >>> source code as of yet. Can anyone clarify on this? >>> >>> >>> >>> There should only be one translation outstanding from each instruction >>> and data side walker. Any nested transactions should be queued in the >>> walker. Until one finishes, I'm not sure how multiple would ever be >>> outstanding. >>> >>> Recall the following image that shows how the number of dynamic >>> instruction (DynInst) objects in-flight increases linearly for varying >>> periods of time: >>> http://dl.dropbox.com/u/2953302/gem5/dyninst_vs_dyninstptr_dramsim2.png >>> After enabling the TLB debug flag, I see that the linear increase in >>> instructions in flight is proportional to the number of TLB misses. These >>> TLB misses have a much larger delay (resulting in translation delays) due >>> to the fact the DramSim2 models the memory system more accurately. It >>> seems that with the classic memory system, TLB misses often do not have >>> translation delays. For whatever reason, it would also seem that every >>> instruction that has a TLB miss also is eventually squashed... >>> >>> From a data side perspective this is reasonable. While a miss is >>> outstanding at some point instructions will stop committing and thus the >>> instructions in flight will begin to rise until the miss is satisfied. >>> >>> Here's a summary of outputs from my trace. These two DPRINTF messages >>> appears on the rising slopes (repeated up until the peak): >>> TLB Miss: Starting hardware table walker for 0(656) >>> TLB Miss: Starting hardware table walker for 0x4(656) >>> >>> This is interesting/odd. I don't know a good reason why (1) a miss >>> would be outstanding to both address 0 and address 4 at the same time. In >>> almost all cases these pages are marked as no-access to detect segfaults. >>> Perhaps there is an issue where the cpu is getting into a loop faulting on >>> a bad access and then faulting again on the fault handler. I could imagine >>> this would happen if there was some corruption in the memory system (for >>> example the timings in dramsim exposing a bug in the cache models or >>> something). >>> >>> >>> At the peak, the following message appears (from fetch) almost every >>> tick for (what I believe to be) every single one of the table walkers that >>> were squashed. >>> Fetch is waiting ITLB walk to finish! >>> >>> There must be another walk in flight? The instruction side will only >>> have one fault outstanding at once. Successive branch mispredicts will >>> re-direct fetch but there is code that catches the fact that a different >>> walk completed then expected and "does the right thing." >>> >>> The problem is that these ITLB table walks are for instructions that >>> were squashed as much as 0.3 billion cycles earlier, and since been removed >>> from the CPU's instruction list. >>> >>> I'm not following here. >>> >>> Any help will be greatly appreciated in solving this problem. I've >>> hit a roadblock with getting Ruby working with ARM, most likely due to the >>> fact that ARM has disjoint memory (x86 and Alpha do not). There's the 256 >>> MB for physical memory, then the 64 MB for the boot loader. I brought this >>> up in my last email about trying to get Ruby working. Therefore, I'm >>> trying to get this DramSim2 integration fixed so I can start modeling FS >>> with DRAM memory. >>> >>> Brad/Steve/Nilay anyone have a suggestion on how to make this work? >>> >>> >>> Note that these problems also occur in Soplex from the Spec CPU2006 >>> benchmark suite (also hits 1500 in-flight instructions assertion). Due to >>> time constraints, I haven't tested on other benchmarks. >>> Thanks, >>> Andrew >>> On Tue, May 1, 2012 at 4:27 AM, Andrew Cebulski <af...@drexel.edu>wrote: >>> >>>> Hey Gabe, >>>> Thanks for this...very helpful. I just recently got back into >>>> debugging this problem. I made a small change in src/base/refcnt.hh to >>>> allow me to return the current count of references to a DynInst object. >>>> I then modified existing DPRINTFs to also print out reference >>>> counts, then added some of my own when I needed extra visibility. >>>> I've found one memory store instruction that seems to be getting >>>> lost. What's happening is that is progresses as far as getting executed in >>>> the IEW once, but a delayed translation occurs, deferring the store. By >>>> the time it reenters the IEW, the IQ has marked the instruction as >>>> squashed. Everything progresses as usual from here on out, with one >>>> exception. When the instruction is removed from the CPUs instruction list, >>>> there is one reference count hanging. >>>> I've added in some additional debugging for my traces to help >>>> narrow down where this reference is coming from. As far as I can tell, >>>> it's because of a call to initiateAcc() within the executeStore function in >>>> the lsq unit. Please see the following two traces. The first trace shows >>>> what I just discussed. The second trace is another memory store >>>> instruction that got squashed, however, it was squashed upon its first >>>> entry into the IEW, therefore it never started execution. >>>> http://dl.dropbox.com/u/2953302/gem5/lostinstruction.out >>>> http://dl.dropbox.com/u/2953302/gem5/similarinstruction.out >>>> Let me know if you have any ideas based on these two instruction >>>> traces. I do not understand how the initiateAcc function results in >>>> another reference, but maybe someone else does.... Since I don't see how >>>> it makes a reference, it's hard to find out how to make sure it gets >>>> dereferenced... >>>> Unfortunately, I haven't been able to add a DPRINTF in >>>> src/base/refcnt.hh ...this would make things more clear (i.e. exactly when >>>> references/deferences occur). Let me know if you have any advice on >>>> this...if it's possible. I can't seem to get the right include files, and >>>> likely right SConscript compile order... >>>> Thanks, >>>> Andrew >>>> >>>> >>>> On Sat, Apr 7, 2012 at 9:48 PM, Gabe Black <gbl...@eecs.umich.edu>wrote: >>>> >>>>> Without digging into things too deeply, it looks like you may be >>>>> leaking references to dynamic instructions. The CPU may think it's done >>>>> with one, but until that final reference is removed, the object will hang >>>>> around forever. I think I've had problems before where there reference >>>>> count ended up off by one somehow and instructions would start piling up. >>>>> It's also possible that a clog develops in O3's pipeline and some internal >>>>> structure stops letting instructions through and starts accumulating them. >>>>> Either of these problems will be annoying to track down, but with enough >>>>> digging I've been able to fix these sorts of things. >>>>> >>>>> This may have more to do with O3 not handling the benchmark you're >>>>> running well rather than a problem with your new DRAM model. There may be >>>>> some interaction between the two, though, where the new memory makes the >>>>> timing line up to cause O3 to behave poorly. What you can do is instrument >>>>> dynamic instruction creation and destruction and reference counting (try >>>>> print "this" for both the reference counting wrapper and the dyn inst >>>>> itself) and turn it on as close as you can to where things go bad tick >>>>> wise. Then look for an instruction which gets lost, and look for where >>>>> it's >>>>> reference count is incremented and decremented. It should be relatively >>>>> easy to pair up where references are created and destroyed, and you should >>>>> be able to identify the reference which never goes away. Then you need to >>>>> figure out where that reference is being created. After that, you should >>>>> have enough information to identify why the reference counting isn't being >>>>> done correctly. It's arduous, but that's the only way. >>>>> >>>>> It's important to also make sure reference counts aren't decremented >>>>> to zero prematurely. I had a problem once where that happened and the >>>>> memory behind the object was updated by something that didn't know it was >>>>> dead. The memory had since been reallocated to another object of the same >>>>> type, so that other object reflected what happened to the phantom one. If >>>>> I >>>>> remember that manifested as something weird like an add causing a page >>>>> fault or something. >>>>> >>>>> Gabe >>>>> >>>>> >>>>> On 04/07/12 18:21, Andrew Cebulski wrote: >>>>> >>>>> Hi all, >>>>> I've looked into this problem some more, and have put together a >>>>> couple traces. I've been becoming more familiar with how gem5 handles >>>>> dynamic instructions, in particular how it destroys them. I have two >>>>> traces to compare, one with the physical memory, and the other with the >>>>> integrated dramsim2 dram memory. I also have two plots showing >>>>> instruction >>>>> counts over time (sim ticks). All of these are linked at the end of the >>>>> email. >>>>> First, I'm going to go into what I've been able to interpret regarding >>>>> how instructions are destroyed. In particular, comparing when DynInst's >>>>> vs. DynInstPtr's are deconstructed/removed from the cpu. I separate these >>>>> because I've seen a difference, as I discuss later. These explanations >>>>> are >>>>> fairly non-existent on the wiki. There is a section header waiting to be >>>>> filled... >>>>> From what I have been able to gather from the code, there is a list of >>>>> all the instructions in flight in cpu/o3/cpu.cc called instList, with the >>>>> type DynInstPtr. There are three conditions to instructions being cleaned >>>>> from this list: >>>>> 1.) The ROB retires its head instruction >>>>> 2.) Fetch receives a rob squashing signal from the commit, resulting >>>>> in removing any instruction not in the ROB >>>>> 3.) Decode detects an incorrect branch prediction, resulting in >>>>> removal of all instructions back to the bad seq num. >>>>> Once all five stages have completed, the CPU cleans up all the removed >>>>> in-flight instructions. This line in particular >>>>> in cleanUpRemovedInsts() in cpu/o3/cpu.cc deconstructs a DynInstPtr: >>>>> instList.erase(removeList.front()); >>>>> When I turn on the debug flag O3CPU, I see the message "Removing >>>>> instruction, ..." (from o3/cpu.cc) with the threadNum, seqNum and pcState >>>>> after all 5 cpu stages have completed, and one of the conditions above is >>>>> met. I also see what tick it occurs on. >>>>> When I turn on the DynInst debug flag, I see when instructions are >>>>> created and destroyed (cpu/base_dyn_inst_impl.hh) and what tick. From >>>>> analyzing the trace files, I've gathered that this takes into account that >>>>> instructions have different execution lengths. So if one tick a memory >>>>> instruction in the instList (DynInstPtr) is removed, the DynInst for that >>>>> memory instruction will occur much later (i.e. 1M ticks later). I have >>>>> yet >>>>> to determine how this is implemented. >>>>> Now for the problem. >>>>> What I'm seeing when I run dramsim2 dram memory is a significant >>>>> difference between the size of the instList vector (of DynInstPtr >>>>> objects), >>>>> and the size of dynamic instruction count (of DynInst objects). The >>>>> benchmark I'm running is libquantum from SPEC 2006. For the first roughly >>>>> 130B ticks, the dynamic instruction count kept in >>>>> cpu/base_dyn_inst.impl.hh >>>>> shadows the instList size in o3/cpu.cc (figure linked below) very closely. >>>>> Around tick 130B after libquantum started, it starts hitting what I'm >>>>> assuming are loops (therefore branch prediction), resulting in some >>>>> behavior that seems to imply improper instruction handling (i.e. more >>>>> instructions in flight than allowed by ROB). >>>>> I wasn't able to sync-up the physical and dramsim2 traces exactly by >>>>> trace, but they should represent roughly the same area of execution. They >>>>> don't execute the same due to the dramsim2 modeling the memory differently >>>>> (i.e. latency and other delays). >>>>> I've shared both traces on my public Dropbox here -- >>>>> >>>>> http://dl.dropbox.com/u/2953302/gem5/physical-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU.out.gz >>>>> >>>>> http://dl.dropbox.com/u/2953302/gem5/dramsim2-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU-2.out.gz >>>>> Here are a couple plots of tick versus instruction count, with respect >>>>> to cpu->instcount in cpu/base_dyn_inst.impl.hh and instList.size() in >>>>> cpu/o3/cpu.cc. -- >>>>> http://dl.dropbox.com/u/2953302/gem5/dyninst_vs_dyninstptr_physical.png >>>>> http://dl.dropbox.com/u/2953302/gem5/dyninst_vs_dyninstptr_dramsim2.png >>>>> Note that I added the printout of the instList size to an existing >>>>> O3CPU DPRINTF in cleanUpRemovedInsts() in cpu/o3/cpu.cc. >>>>> Here are the commands I ran to parse the traces into data files to >>>>> analyze in MATLAB and create the plots: >>>>> zgrep DynInst >>>>> dramsim2-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU-2.out.gz | grep >>>>> destroyed >>>>> | awk '{print $1,$11}' > cpuinstcount.out >>>>> zgrep instList >>>>> dramsim2-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU-2.out.gz | awk '{print >>>>> $1,$11}' > instlistsize.out >>>>> It seems to me like the problem might lie in gem5, but has just been >>>>> exposed by integrating this more detailed memory model, dramsim2, into >>>>> gem5. Either that, or their are some timing errors in how dramsim2 was >>>>> integrated. I doubt this, however, since those first 190B ticks executed >>>>> used the dramsim2 memory. I believe the problem is a combination of >>>>> memory >>>>> instructions + complex loops (branch prediction), resulting in improper >>>>> destroying of instructions. >>>>> I've included the ROB, Commit, Fetch, DynInst and O3CPU debug flags. >>>>> Their are 192 ROB entries, which is why the instList size generally has a >>>>> max of about 192 instructions. The dynamic instruction counts (seen in >>>>> the >>>>> dramsim2 plot) seem to also imply that instructions are incorrectly been >>>>> removed from the ROB, and then from the cpu's instruction list in cpu.cc, >>>>> which allows more and more instructions to be added to the system >>>>> (possibly >>>>> from a bad branch). >>>>> I appreciate any help in debugging this and further figuring out the >>>>> root problem, just let me know if you need anything else from me. I don't >>>>> have much more time at the moment to debug, but I can take any advice for >>>>> quick changes and/or additional traces, then send the results back to the >>>>> list for discussion. >>>>> Thanks, >>>>> Andrew >>>>> P.S. Paul - I did try decreasing the size of the dramsim2 transaction >>>>> (and even command) queue from 512 to 32. The same instructions problem >>>>> occurred. It basically just decreased the execution time. >>>>> >>>>> On Wed, Mar 14, 2012 at 2:10 PM, Ali Saidi <sa...@umich.edu> wrote: >>>>> >>>>>> The error is that there are more that 1500 instructions currently >>>>>> in flight in the system. It could mean several things: >>>>>> >>>>>> 1. The value is somewhat arbitrarily defined and maybe there are more >>>>>> than 1500 in your system at one time? >>>>>> >>>>>> 2. Instructions aren't being destroyed correctly >>>>>> >>>>>> You could try to to run a debug binary so you'll get a list of >>>>>> instructions when it happens or increase the number which may >>>>>> be appropriate for certain situations (but 1500 is quite a few inflight >>>>>> instructions). >>>>>> >>>>>> Ali >>>>>> >>>>>> On 13.03.2012 10:56, Andrew Cebulski wrote: >>>>>> >>>>>> Hi Xiangyu, >>>>>> I just started looking into this some more. So at first I >>>>>> thought it was due to updating to a more recent revision, but then I went >>>>>> back to revision 8643, added your patch, built and ran....and now get the >>>>>> error with it too (when running ARM_FS/gem5.opt). I"m testing now to see >>>>>> if an update to SWIG might have resulted in this error, maybe someone on >>>>>> the mailing list would know if that's possible. The difference is 1.3.40 >>>>>> vs. 2.0.3, both of which are supported according to the dependencies wiki >>>>>> page. >>>>>> Just for completeness, here's the error from revision 8643: >>>>>> build/ARM_FS/cpu/base_dyn_inst_impl.hh:149: void >>>>>> BaseDynInst::initVars() [with Impl = O3CPUImpl]: Assertion >>>>>> `cpu->instcount >>>>>> I have not tried running with gem5.debug, so I will be doing that >>>>>> today. Maybe this is an assertion that is occurring due to an >>>>>> optimization. That would mean it wouldn't be triggered in gem5.debug >>>>>> since >>>>>> it runs without optimizations. Have you tested all debug, opt and fast >>>>>> with your tests? >>>>>> Thanks, >>>>>> Andrew >>>>>> >>>>>> On Tue, Mar 13, 2012 at 1:37 PM, Rio Xiangyu Dong < >>>>>> riosher...@gmail.com> wrote: >>>>>> >>>>>>> Hi Andrew, >>>>>>> >>>>>>> >>>>>>> >>>>>>> I didn’t see this error in my simulations. May I ask which gem5 >>>>>>> version you are using? I find some of the latest code updates do not >>>>>>> comply >>>>>>> with my changes. I am still using the DRAMsim2 patch on Gem5 repo8643, >>>>>>> and >>>>>>> have run all the runnable benchmarks in SPEC2006, SPEC2000, EEMBC2, and >>>>>>> PARSEC2 on ARM_SE. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thank you! >>>>>>> >>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Xiangyu >>>>>>> >>>>>>> >>>>>>> >>>>>>> *From:* Andrew Cebulski [mailto:af...@drexel.edu] >>>>>>> *Sent:* Thursday, March 08, 2012 6:52 PM >>>>>>> >>>>>>> *To:* gem5 users mailing list >>>>>>> *Cc:*riosher...@gmail.com; sa...@umich.edu >>>>>>> >>>>>>> *Subject:* Re: [gem5-users] A Patch for DRAMsim2 Integration >>>>>>> >>>>>>> Xiangyu, >>>>>>> >>>>>>> I've been having an issue recently with the number of >>>>>>> instructions I've been seeing committed to the CPU (I have a separate >>>>>>> thread on this). It turns out the issue seems to be coming from this >>>>>>> patch >>>>>>> you created to integrate DramSim2 with Gem5. Unfortunately, I've been >>>>>>> running with gem5.fast, not gem5.opt. So up until now, I haven't been >>>>>>> seeing assertions. I thought I'd run it with gem5.opt or debug back in >>>>>>> December, but I must not have. My runs on the Arm O3 cpu fails with >>>>>>> this >>>>>>> assertion: >>>>>>> >>>>>>> build/ARM/cpu/base_dyn_inst_impl.hh:149: void >>>>>>> BaseDynInst::initVars() [with Impl = O3CPUImpl]: Assertion >>>>>>> `cpu->instcount >>>>>>> >>>>>>> -Andrew >>>>>>> >>>>>>> Date: Sun, 18 Dec 2011 01:48:58 -0800 >>>>>>> From: "Dong, Xiangyu" <riosher...@gmail.com> >>>>>>> To: "gem5 users mailing list" <gem5-users@gem5.org> >>>>>>> Subject: [gem5-users] A Patch for DRAMsim2 Integration >>>>>>> Message-ID: gmail.com> >>>>>>> >>>>>>> Content-Type: text/plain; charset="us-ascii" >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> >>>>>>> >>>>>>> I have a Gem5+DRAMsim2 patch. I've tested it under both SE and FS >>>>>>> modes. >>>>>>> I'm willing to share it here. >>>>>>> >>>>>>> >>>>>>> >>>>>>> For those who have such needs, please go to my website >>>>>>> www.cse.psu.edu/~xydong <http://www.cse.psu.edu/%7Exydong> to >>>>>>> download the patch and test it. To enable >>>>>>> DRAMSim2, use se_dramsim2.py script instead of se.py (for FS, you >>>>>>> can create >>>>>>> by yourself). The basic idea to enable the DRAMsim2 module is to >>>>>>> use the >>>>>>> derived DRAMMemory class instead of PhysicalMemory class. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Please let me know if there are bugs. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thank you! >>>>>>> >>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Xiangyu Dong >>>>>>> >>>>>>> -------------- next part -------------- >>>>>>> An HTML attachment was scrubbed... >>>>>>> URL: < >>>>>>> http://m5sim.org/cgi-bin/mailman/private/gem5-users/attachments/20111218/f3fdf5da/attachment.html >>>>>>> > >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> gem5-users mailing list >>>>>> gem5-users@gem5.org >>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>>>>> >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> gem5-users mailing >>>>> listgem5-users@gem5.orghttp://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>>>> >>>>> >>>>> _______________________________________________ >>>>> gem5-users mailing list >>>>> gem5-users@gem5.org >>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>>>> >>>> >>> >>> >>> _______________________________________________ >>> gem5-users mailing list >>> gem5-users@gem5.org >>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> >> >> >> >> _______________________________________________ >> gem5-users mailing list >> gem5-users@gem5.org >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > > > > _______________________________________________ > gem5-users mailing list > gem5-users@gem5.org > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users