Re: [gem5-users] A Patch for DRAMsim2 Integration

Ali Saidi Wed, 02 May 2012 14:28:52 -0700

 

Something is wrong well before this point. There is no reason that
address 0x0 or 0x4 should be translated.


Did you happen to create a
checkpoint when caches were in the system? 

Have you tried to run with
the checker cpu and see if it detects any errors? 

Ali 

On 02.05.2012
17:22, Andrew Cebulski wrote: 

> They are data TLB misses that occur as
the in-flight instruction count rises (at 0x0 and 0x4). The last TLB
miss before the in-flight instruction count finally linearly decreases
is to 0x200. Also, at the start of the rising slope, I see a miss to 0x8
and 0x2508c. 
> 
> Here's a trace file:
> 
>
http://dl.dropbox.com/u/2953302/gem5/tlb.out [26] 
> To reduce size, I
just have lines that have either TLB or walker in them. 
> I do see only
a handful of instruction TLB misses. 
> 
> -Andrew
> 
> On Wed, May 2,
2012 at 11:10 AM, Ali Saidi <sa...@umich.edu [27]> wrote:
> 
>> Hi
Andrew, 
>> 
>> Thanks for digging into this. I think there is an issue
somewhere, but I'm still not sure where. 
>> 
>> Ali 
>> 
>> On
01.05.2012 23:34, Andrew Cebulski wrote: 
>> 
>>> Okay, I'm positive now
that the issue lies with delayed translations that are squashed before
finishing.
>> 
>> On the data on instruction side? You seem to allude to
data in the paragraph below, but then instructions in the latter text.

>> 
>>> It seems to me like speculative load/stores are being executed,
rather than waiting for the instructions to commit. Once the
instructions begin getting (speculatively) executed in the TLB, a
reference is left there, which seems hard to root out and dereference
after the instruction ends up being squashed. At least, I have not been
able to find that out in the source code as of yet. Can anyone clarify
on this?
>> 
>> There should only be one translation outstanding from
each instruction and data side walker. Any nested transactions should be
queued in the walker. Until one finishes, I'm not sure how multiple
would ever be outstanding. 
>> 
>> R
>> 
>>> ncreases linearly for
varying periods of time: 
>>>
http://dl.dropbox.com/u/2953302/gem5/dyninst_vs_dyninstptr_dramsim2.png
[1] 
>>> After enabling the TLB debug flag, I see that the linear
increase in instructions in flight is proportional to the number of TLB
misses. These TLB misses have a much larger delay (resulting in
translation delays) due to the fact the DramSim2 models the memory
system more accurately. It seems that with the classic memory system,
TLB misses often do not have translation delays. For whatever reason, it
would also seem that every instruction that has a TLB miss also is
eventually squashed... 
>>> 
>>> From a data side perspective this is
reasonable. While a miss is ou
>> some point instructions will stop
committing and thus the instructions in flight will begin to rise until
the miss is satisfied. 
>> 
>> Here's a summary of outputs from my
trace. These two DPRINTF messages appears on the rising slopes (repeated
up until the peak): 
>> TLB
>> 
>>> r 0x4(656) 
>>> 
>>> This is
interesting/odd. I don't know a good reason why (1) a miss would be
outstanding to both address 0 and address 4 at the same time. In almost
all cases these pages are marked as no-access to detect segfaults.
Perhaps there is a
>> the cpu is getting into a loop faulting on a bad
access and then faulting again on the fault handler. I could imagine
this would happen if there was some corruption in the memory system (for
example the timings in dramsim exposing a bug in the cache models or
something). 
>> 
>> At the peak, the following message appears (from
fetch) almost every tick for (what I believe to be) every single one of
the table walkers that were squashed. 
>> Fetch is waiting ITLB walk to
finish! 
>> 
>> There must be another walk in flight? The instruction
side will only have one fault outstanding at once. Successive branch
mispredicts will
>> 
>>> nd "does the right thing." 
>>> 
>>> The
problem is that these ITLB table walks are for instructions that wer
>>
much as 0.3 billion cycles earlier, and since been removed from the
CPU's instruction list. 
>> 
>> I'm not following here. 
>> 
>> Any help
will be greatly appreciated in solving this problem. I've hit a
roadblock with getting Ruby working with ARM, most likely due to the
fact
>> 
>>> the 64 MB for the boot loader. I brought this up in my last
email about trying to get Ruby working. Therefore, I'm trying to get
this DramSim2 integration fixed so I can start modeling FS with D
>>
div> 
>> 
>> Brad/Steve/Nilay anyone have a suggestion on how to make
this work? 
>> 
>> Note that these problem
>> 
>>> rtion). Due to time
constraints, I haven't tested on other benchmarks. 
>>> Thanks, 
>>>
Andrew 
>>> 
>>> On Tue, May 1, 2012 at 4:27 AM, Andrew Cebulski
<af...@drexel.edu [2]> wrote: 
>>> 
>>> Hey Gabe, 
>>> Thanks for
this.
>> l. I just recently got back into debugging this problem. I made
a small change in src/base/refcnt.hh to allow me to return the current
count of references to a DynInst object. 
>> I the
>> 
>>> extra
visibility. 
>>> I've found one memory store instruction that seems to
be getting lost. What's happening is that is progresses as far as
getting executed in the IEW once, but a delayed translation occurs,
deferring the store. By the time it reenters the IEW, the IQ has marked
the instruction as squashed. Everything progresses as usual from here on
out, with one exception. When the instruction is removed from the CPUs
instruction list, there is one reference count hanging. 
>>> I've added
in some additional debugging for my traces to help narrow down where
this reference is coming from. As far as I can tell, it's because of a
call to initiateAcc() within the executeStore function in the lsq unit.
Please see the following two traces. The first trace shows what I just
discussed. The second trace is another memory store instruction that got
squashed, however, it was squashed upon its first entry into the IEW,
therefore it never started execution. 
>>>
http://dl.dropbox.com/u/2953302/gem5/lostinstruction.out [21] 
>>>
http://dl.dropbox.com/u/2953302/gem5/similarinstruction.out [22] 
>>>
Let me know if you have any ideas based on these two instruction traces.
I do not understand how the initiateAcc function results in another
reference, but maybe someone else does.... Since I don't see how it
makes a reference, it's hard to find out how to make sure it gets
dereferenced... 
>>> Unfortunately, I haven't been able to add a DPRINTF
in src/base/refcnt.hh ...this would make things more clear (i.e. exactly
when references/deferences occur). Let me know if you have any advice on
this...if it's possible. I can't seem to get the right include files,
and likely right SConscript compile order... 
>>> Thanks, 
>>> Andrew

>>> 
>>> On Sat, Apr 7, 2012 at 9:48 PM, Gabe Black
<gbl...@eecs.umich.edu [23]> wrote: 
>>> 
>>>> Without digging into
things too deeply, it looks like you may be leaking references to
dynamic instructions. The CPU may think it's done with one, but until
that final reference is removed, the object will hang around forever. I
think I've had problems before where there reference count ended up off
by one somehow and instructions would start piling up. It's also
possible that a clog develops in O3's pipeline and some internal
structure stops letting instructions through and starts accumulating
them. Either of these problems will be annoying to track down, but with
enough digging I've been able to fix these sorts of things.
>>>> 
>>>>
This may have more to do with O3 not handling the benchmark you're
running well rather than a problem with your new DRAM model. There may
be some interaction between the two, though, where the new memory makes
the timing line up to cause O3 to behave poorly. What you can do is
instrument dynamic instruction creation and destruction and reference
counting (try print "this" for both the reference counting wrapper and
the dyn inst itself) and turn it on as close as you can to where things
go bad tick wise. Then look for an instruction which gets lost, and look
for where it's reference count is incremented and decremented. It should
be relatively easy to pair up where references are created and
destroyed, and you should be able to identify the reference which never
goes away. Then you need to figure out where that reference is being
created. After that, you should have enough information to identify why
the reference counting isn't being done correctly. It's arduous, but
that's the only way.
>>>> 
>>>> It's important to also make sure
reference counts aren't decremented to zero prematurely. I had a problem
once where that happened and the memory behind the object was updated by
something that didn't know it was dead. The memory had since been
reallocated to another object of the same type, so that other object
reflected what happened to the phantom one. If I remember that
manifested as something weird like an add causing a page fault or
something.
>>>> 
>>>> Gabe 
>>>> 
>>>> On 04/07/12 18:21, Andrew
Cebulski wrote: 
>>>> 
>>>>> Hi all, 
>>>>> I've looked into this
problem some more, and have put together a couple traces. I've been
becoming more familiar with how gem5 handles dynamic instructions, in
particular how it destroys them. I have two traces to compare, one with
the physical memory, and the other with the integrated dramsim2 dram
memory. I also have two plots showing instruction counts over time (sim
ticks). All of these are linked at the end of the email. 
>>>>> First,
I'm going to go into what I've been able to interpret regarding how
instructions are destroyed. In particular, comparing when DynInst's vs.
DynInstPtr's are deconstructed/removed from the cpu. I separate these
because I've seen a difference, as I discuss later. These explanations
are fairly non-existent on the wiki. There is a section header waiting
to be filled... 
>>>>> From what I have been able to gather from the
code, there is a list of all the instructions in flight in cpu/o3/cpu.cc
called instList, with the type DynInstPtr. There are three conditions to
instructions being cleaned from this list: 
>>>>> 1.) The ROB retires
its head instruction 
>>>>> 2.) Fetch receives a rob squashing signal
from the commit, resulting in removing any instruction not in the ROB

>>>>> 3.) Decode detects an incorrect branch prediction, resulting in
removal of all instructions back to the bad seq num. 
>>>>> Once all
five stages have completed, the CPU cleans up all the removed in-flight
instructions. This line in particular in cleanUpRemovedInsts() in
cpu/o3/cpu.cc deconstructs a DynInstPtr: 
>>>>>
instList.erase(removeList.front()); 
>>>>> When I turn on the debug flag
O3CPU, I see the message "Removing instruction, ..." (from o3/cpu.cc)
with the threadNum, seqNum and pcState after all 5 cpu stages have
completed, and one of the conditions above is met. I also see what tick
it occurs on. 
>>>>> When I turn on the DynInst debug flag, I see when
instructions are created and destroyed (cpu/base_dyn_inst_impl.hh) and
what tick. From analyzing the trace files, I've gathered that this takes
into account that instructions have different execution lengths. So if
one tick a memory instruction in the instList (DynInstPtr) is removed,
the DynInst for that memory instruction will occur much later (i.e. 1M
ticks later). I have yet to determine how this is implemented. 
>>>>>
Now for the problem. 
>>>>> What I'm seeing when I run dramsim2 dram
memory is a significant difference between the size of the instList
vector (of DynInstPtr objects), and the size of dynamic instruction
count (of DynInst objects). The benchmark I'm running is libquantum from
SPEC 2006. For the first roughly 130B ticks, the dynamic instruction
count kept in cpu/base_dyn_inst.impl.hh shadows the instList size in
o3/cpu.cc (figure linked below) very closely. Around tick 130B after
libquantum started, it starts hitting what I'm assuming are loops
(therefore branch prediction), resulting in some behavior that seems to
imply improper instruction handling (i.e. more instructions in flight
than allowed by ROB). 
>>>>> I wasn't able to sync-up the physical and
dramsim2 traces exactly by trace, but they should represent roughly the
same area of execution. They don't execute the same due to the dramsim2
modeling the memory differently (i.e. latency and other delays). 
>>>>>
I've shared both traces on my public Dropbox here -- 
>>>>>
http://dl.dropbox.com/u/2953302/gem5/physical-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU.out.gz
[14] 
>>>>>
http://dl.dropbox.com/u/2953302/gem5/dramsim2-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU-2.out.gz
[15] 
>>>>> Here are a couple plots of tick versus instruction count,
with respect to cpu->instcount in cpu/base_dyn_inst.impl.hh and
instList.size() in cpu/o3/cpu.cc. --
http://dl.dropbox.com/u/2953302/gem5/dyninst_vs_dyninstptr_physical.png
[16]
>>>>> 
>>>>>
http://dl.dropbox.com/u/2953302/gem5/dyninst_vs_dyninstptr_dramsim2.png
[17] 
>>>>> Note that I added the printout of the instList size to an
existing O3CPU DPRINTF in cleanUpRemovedInsts() in cpu/o3/cpu.cc. 
>>>>>
Here are the commands I ran to parse the traces into data files to
analyze in MATLAB and create the plots: 
>>>>> zgrep DynInst
dramsim2-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU-2.out.gz | grep
destroyed | awk '{print $1,$11}' > cpuinstcount.out 
>>>>> zgrep
instList dramsim2-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU-2.out.gz |
awk '{print $1,$11}' > instlistsize.out 
>>>>> It seems to me like the
problem might lie in gem5, but has just been exposed by integrating this
more detailed memory model, dramsim2, into gem5. Either that, or their
are some timing errors in how dramsim2 was integrated. I doubt this,
however, since those first 190B ticks executed used the dramsim2 memory.
I believe the problem is a combination of memory instructions + complex
loops (branch prediction), resulting in improper destroying of
instructions. 
>>>>> I've included the ROB, Commit, Fetch, DynInst and
O3CPU debug flags. Their are 192 ROB entries, which is why the instList
size generally has a max of about 192 instructions. The dynamic
instruction counts (seen in the dramsim2 plot) seem to also imply that
instructions are incorrectly been removed from the ROB, and then from
the cpu's instruction list in cpu.cc, which allows more and more
instructions to be added to the system (possibly from a bad branch).

>>>>> I appreciate any help in debugging this and further figuring out
the root problem, just let me know if you need anything else from me. I
don't have much more time at the moment to debug, but I can take any
advice for quick changes and/or additional traces, then send the results
back to the list for discussion. 
>>>>> Thanks, 
>>>>> Andrew 
>>>>>
P.S. Paul - I did try decreasing the size of the dramsim2 transaction
(and even command) queue from 512 to 32. The same instructions problem
occurred. It basically just decreased the execution time. 
>>>>> 
>>>>>
On Wed, Mar 14, 2012 at 2:10 PM, Ali Saidi <sa...@umich.edu [18]>
wrote:
>>>>> 
>>>>>> The error is that there are more that 1500
instructions currently in flight in the system. It could mean several
things: 
>>>>>> 
>>>>>> 1. The value is somewhat arbitrarily defined and
maybe there are more than 1500 in your system at one time? 
>>>>>>

>>>>>> 2. Instructions aren't being destroyed correctly 
>>>>>> 
>>>>>>
You could try to to run a debug binary so you'll get a list of
instructions when it happens or increase the number which may be
appropriate for certain situations (but 1500 is quite a few inflight
instructions). 
>>>>>> 
>>>>>> Ali 
>>>>>> 
>>>>>> On 13.03.2012 10:56,
Andrew Cebulski wrote: 
>>>>>> 
>>>>>>> Hi Xiangyu, 
>>>>>>> I just
started looking into this some more. So at first I thought it was due to
updating to a more recent revision, but then I went back to revision
8643, added your patch, built and ran....and now get the error with it
too (when running ARM_FS/gem5.opt). I"m testing now to see if an update
to SWIG might have resulted in this error, maybe someone on the mailing
list would know if that's possible. The difference is 1.3.40 vs. 2.0.3,
both of which are supported according to the dependencies wiki page.

>>>>>>> Just for completeness, here's the error from revision 8643:

>>>>>>> build/ARM_FS/cpu/base_dyn_inst_impl.hh:149: void
BaseDynInst::initVars() [with Impl = O3CPUImpl]: Assertion
`cpu->instcount 
>>>>>>> 
>>>>>>> I have not tried running with
gem5.debug, so I will be doing that today. Maybe this is an assertion
that is occurring due to an optimization. That would mean it wouldn't be
triggered in gem5.debug since it runs without optimizations. Have you
tested all debug, opt and fast with your tests? 
>>>>>>> Thanks,

>>>>>>> Andrew
>>>>>>> 
>>>>>>> On Tue, Mar 13, 2012 at 1:37 PM, Rio
Xiangyu Dong <riosher...@gmail.com [11]> wrote: 
>>>>>>> 
>>>>>>>> Hi
Andrew, 
>>>>>>>> 
>>>>>>>> I didn't see this error in my simulations.
May I ask which gem5 version you are using? I find some of the latest
code updates do not comply with my changes. I am still using the
DRAMsim2 patch on Gem5 repo8643, and have run all the runnable
benchmarks in SPEC2006, SPEC2000, EEMBC2, and PARSEC2 on ARM_SE.

>>>>>>>> 
>>>>>>>> Thank you! 
>>>>>>>> 
>>>>>>>> Best, 
>>>>>>>>

>>>>>>>> Xiangyu 
>>>>>>>> 
>>>>>>>> FROM: Andrew Cebulski
[mailto:af...@drexel.edu [8]] 
>>>>>>>> SENT: Thursday, March 08, 2012
6:52 PM 
>>>>>>>> 
>>>>>>>> TO: gem5 users mailing list
CC:riosher...@gmail.com [9]; sa...@umich.edu [10] 
>>>>>>>> 
>>>>>>>>
SUBJECT: Re: [gem5-users] A Patch for DRAMsim2 Integration 
>>>>>>>>

>>>>>>>> Xiangyu, 
>>>>>>>> 
>>>>>>>> I've been having an issue
recently with the number of instructions I've been seeing committed to
the CPU (I have a separate thread on this). It turns out the issue seems
to be coming from this patch you created to integrate DramSim2 with
Gem5. Unfortunately, I've been running with gem5.fast, not gem5.opt. So
up until now, I haven't been seeing assertions. I thought I'd run it
with gem5.opt or debug back in December, but I must not have. My runs on
the Arm O3 cpu fails with this assertion: 
>>>>>>>> 
>>>>>>>>
build/ARM/cpu/base_dyn_inst_impl.hh:149: void BaseDynInst::initVars()
[with Impl = O3CPUImpl]: Assertion `cpu->instcount 
>>>>>>>> 
>>>>>>>>
-Andrew 
>>>>>>>> 
>>>>>>>>> Date: Sun, 18 Dec 2011 01:48:58
-0800
>>>>>>>>> From: "Dong, Xiangyu" <riosher...@gmail.com
[3]>
>>>>>>>>> To: "gem5 users mailing list" <gem5-users@gem5.org
[4]>
>>>>>>>>> Subject: [gem5-users] A Patch for DRAMsim2 Integration
Message-ID: gmail.com [5]> 
>>>>>>>>> 
>>>>>>>>> Content-Type:
text/plain; charset="us-ascii"
>>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>>

>>>>>>>>> I have a Gem5+DRAMsim2 patch. I've tested it under both SE
and FS modes.
>>>>>>>>> I'm willing to share it here.
>>>>>>>>>

>>>>>>>>> For those who have such needs, please go to my
website
>>>>>>>>> www.cse.psu.edu/~xydong [6] to download the patch and
test it. To enable
>>>>>>>>> DRAMSim2, use se_dramsim2.py script instead
of se.py (for FS, you can create
>>>>>>>>> by yourself). The basic idea
to enable the DRAMsim2 module is to use the
>>>>>>>>> derived DRAMMemory
class instead of PhysicalMemory class.
>>>>>>>>> 
>>>>>>>>> Please let
me know if there are bugs.
>>>>>>>>> 
>>>>>>>>> Thank you!
>>>>>>>>>

>>>>>>>>> Best,
>>>>>>>>> 
>>>>>>>>> Xiangyu Dong
>>>>>>>>> 
>>>>>>>>>
-------------- next part --------------
>>>>>>>>> An HTML attachment was
scrubbed...
>>>>>>>>> URL:
<http://m5sim.org/cgi-bin/mailman/private/gem5-users/attachments/20111218/f3fdf5da/attachment.html
[7]>
>>>>>>> 
>>>>>>>
_______________________________________________
>>>>>>> gem5-users
mailing list
>>>>>>> gem5-users@gem5.org [12]
>>>>>>>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [13]
>>>>> 
>>>>>
_______________________________________________
>>>>> gem5-users mailing
list
>>>>>
gem5-users@gem5.orghttp://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>

>>>> _______________________________________________
>>>> gem5-users
mailing list
>>>> gem5-users@gem5.org [19]
>>>>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [20]
>>> 
>>>
_______________________________________________
>>> gem5-users mailing
list
>>> gem5-users@gem5.org [24]
>>>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [25]




Links:
------
[1]
http://dl.dropbox.com/u/2953302/gem5/dyninst_vs_dyninstptr_dramsim2.png
[2]
mailto:af...@drexel.edu
[3] mailto:riosher...@gmail.com
[4]
mailto:gem5-users@gem5.org
[5] http://gmail.com
[6]
http://www.cse.psu.edu/%7Exydong
[7]
http://m5sim.org/cgi-bin/mailman/private/gem5-users/attachments/20111218/f3fdf5da/attachment.html
[8]
mailto:af...@drexel.edu
[9] mailto:riosher...@gmail.com
[10]
mailto:sa...@umich.edu
[11] mailto:riosher...@gmail.com
[12]
mailto:gem5-users@gem5.org
[13]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[14]
http://dl.dropbox.com/u/2953302/gem5/physical-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU.out.gz
[15]
http://dl.dropbox.com/u/2953302/gem5/dramsim2-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU-2.out.gz
[16]
http://dl.dropbox.com/u/2953302/gem5/dyninst_vs_dyninstptr_physical.png
[17]
http://dl.dropbox.com/u/2953302/gem5/dyninst_vs_dyninstptr_dramsim2.png
[18]
mailto:sa...@umich.edu
[19] mailto:gem5-users@gem5.org
[20]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[21]
http://dl.dropbox.com/u/2953302/gem5/lostinstruction.out
[22]
http://dl.dropbox.com/u/2953302/gem5/similarinstruction.out
[23]
mailto:gbl...@eecs.umich.edu
[24] mailto:gem5-users@gem5.org
[25]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
[26]
http://dl.dropbox.com/u/2953302/gem5/tlb.out
[27]
mailto:sa...@umich.edu

_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] A Patch for DRAMsim2 Integration

Reply via email to