Without digging into things too deeply, it looks like you may be
leaking references to dynamic instructions. The CPU may think
it's done
with one, but until that final reference is removed, the
object will hang
around forever. I think I've had problems before where there reference
count ended up off by one somehow and instructions would
start piling up.
It's also possible that a clog develops in O3's pipeline and
some internal
structure stops letting instructions through and starts
accumulating them.
Either of these problems will be annoying to track down, but
with enough
digging I've been able to fix these sorts of things.
This may have more to do with O3 not handling the benchmark you're
running well rather than a problem with your new DRAM model.
There may be
some interaction between the two, though, where the new
memory makes the
timing line up to cause O3 to behave poorly. What you can do
is instrument
dynamic instruction creation and destruction and reference
counting (try
print "this" for both the reference counting wrapper and the dyn inst
itself) and turn it on as close as you can to where things go bad tick
wise. Then look for an instruction which gets lost, and look
for where it's
reference count is incremented and decremented. It should be
relatively
easy to pair up where references are created and destroyed,
and you should
be able to identify the reference which never goes away. Then
you need to
figure out where that reference is being created. After that,
you should
have enough information to identify why the reference
counting isn't being
done correctly. It's arduous, but that's the only way.
It's important to also make sure reference counts aren't decremented
to zero prematurely. I had a problem once where that happened and the
memory behind the object was updated by something that didn't
know it was
dead. The memory had since been reallocated to another object
of the same
type, so that other object reflected what happened to the
phantom one. If I
remember that manifested as something weird like an add causing a page
fault or something.
Gabe
On 04/07/12 18:21, Andrew Cebulski wrote:
Hi all,
I've looked into this problem some more, and have put together a
couple traces. I've been becoming more familiar with how gem5 handles
dynamic instructions, in particular how it destroys them. I have two
traces to compare, one with the physical memory, and the
other with the
integrated dramsim2 dram memory. I also have two plots
showing instruction
counts over time (sim ticks). All of these are linked at the
end of the
email.
First, I'm going to go into what I've been able to interpret
regarding how instructions are destroyed. In particular,
comparing when
DynInst's vs. DynInstPtr's are deconstructed/removed from the cpu. I
separate these because I've seen a difference, as I discuss
later. These
explanations are fairly non-existent on the wiki. There is a section
header waiting to be filled...
From what I have been able to gather from the code, there is a list
of all the instructions in flight in cpu/o3/cpu.cc called
instList, with
the type DynInstPtr. There are three conditions to instructions being
cleaned from this list:
1.) The ROB retires its head instruction
2.) Fetch receives a rob squashing signal from the commit,
resulting in removing any instruction not in the ROB
3.) Decode detects an incorrect branch prediction, resulting in
removal of all instructions back to the bad seq num.
Once all five stages have completed, the CPU cleans up all the
removed in-flight instructions. This line in particular
in cleanUpRemovedInsts() in cpu/o3/cpu.cc deconstructs a DynInstPtr:
instList.erase(removeList.front());
When I turn on the debug flag O3CPU, I see the message "Removing
instruction, ..." (from o3/cpu.cc) with the threadNum, seqNum
and pcState
after all 5 cpu stages have completed, and one of the
conditions above is
met. I also see what tick it occurs on.
When I turn on the DynInst debug flag, I see when instructions are
created and destroyed (cpu/base_dyn_inst_impl.hh) and what tick. From
analyzing the trace files, I've gathered that this takes into
account that
instructions have different execution lengths. So if one
tick a memory
instruction in the instList (DynInstPtr) is removed, the
DynInst for that
memory instruction will occur much later (i.e. 1M ticks
later). I have yet
to determine how this is implemented.
Now for the problem.
What I'm seeing when I run dramsim2 dram memory is a significant
difference between the size of the instList vector (of
DynInstPtr objects),
and the size of dynamic instruction count (of DynInst objects). The
benchmark I'm running is libquantum from SPEC 2006. For the
first roughly
130B ticks, the dynamic instruction count kept in
cpu/base_dyn_inst.impl.hh
shadows the instList size in o3/cpu.cc (figure linked below)
very closely.
Around tick 130B after libquantum started, it starts hitting what I'm
assuming are loops (therefore branch prediction), resulting in some
behavior that seems to imply improper instruction handling (i.e. more
instructions in flight than allowed by ROB).
I wasn't able to sync-up the physical and dramsim2 traces exactly by
trace, but they should represent roughly the same area of
execution. They
don't execute the same due to the dramsim2 modeling the
memory differently
(i.e. latency and other delays).
I've shared both traces on my public Dropbox here --
http://dl.dropbox.com/u/2953302/gem5/physical-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU.out.gz
http://dl.dropbox.com/u/2953302/gem5/dramsim2-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU-2.out.gz
Here are a couple plots of tick versus instruction count, with
respect to cpu->instcount in cpu/base_dyn_inst.impl.hh and
instList.size()
in cpu/o3/cpu.cc. --
http://dl.dropbox.com/u/2953302/gem5/dyninst_vs_dyninstptr_physical.png
http://dl.dropbox.com/u/2953302/gem5/dyninst_vs_dyninstptr_dramsim2.png
Note that I added the printout of the instList size to an existing
O3CPU DPRINTF in cleanUpRemovedInsts() in cpu/o3/cpu.cc.
Here are the commands I ran to parse the traces into data files to
analyze in MATLAB and create the plots:
zgrep DynInst
dramsim2-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU-2.out.gz |
grep destroyed
| awk '{print $1,$11}' > cpuinstcount.out
zgrep instList
dramsim2-fs-040612-ROB-Commit-DynInst-Fetch-O3CPU-2.out.gz |
awk '{print
$1,$11}' > instlistsize.out
It seems to me like the problem might lie in gem5, but has just been
exposed by integrating this more detailed memory model, dramsim2, into
gem5. Either that, or their are some timing errors in how
dramsim2 was
integrated. I doubt this, however, since those first 190B
ticks executed
used the dramsim2 memory. I believe the problem is a
combination of memory
instructions + complex loops (branch prediction), resulting
in improper
destroying of instructions.
I've included the ROB, Commit, Fetch, DynInst and O3CPU debug flags.
Their are 192 ROB entries, which is why the instList size
generally has a
max of about 192 instructions. The dynamic instruction
counts (seen in the
dramsim2 plot) seem to also imply that instructions are
incorrectly been
removed from the ROB, and then from the cpu's instruction
list in cpu.cc,
which allows more and more instructions to be added to the
system (possibly
from a bad branch).
I appreciate any help in debugging this and further figuring out the
root problem, just let me know if you need anything else from
me. I don't
have much more time at the moment to debug, but I can take
any advice for
quick changes and/or additional traces, then send the results
back to the
list for discussion.
Thanks,
Andrew
P.S. Paul - I did try decreasing the size of the dramsim2
transaction (and even command) queue from 512 to 32. The
same instructions
problem occurred. It basically just decreased the execution time.
On Wed, Mar 14, 2012 at 2:10 PM, Ali Saidi <sa...@umich.edu> wrote:
The error is that there are more that 1500 instructions currently
in flight in the system. It could mean several things:
1. The value is somewhat arbitrarily defined and maybe there are
more than 1500 in your system at one time?
2. Instructions aren't being destroyed correctly
You could try to to run a debug binary so you'll get a list of
instructions when it happens or increase the number which may
be appropriate for certain situations (but 1500 is quite a
few inflight
instructions).
Ali
On 13.03.2012 10:56, Andrew Cebulski wrote:
Hi Xiangyu,
I just started looking into this some more. So at first I
thought it was due to updating to a more recent revision,
but then I went
back to revision 8643, added your patch, built and
ran....and now get the
error with it too (when running ARM_FS/gem5.opt). I"m
testing now to see
if an update to SWIG might have resulted in this error,
maybe someone on
the mailing list would know if that's possible. The
difference is 1.3.40
vs. 2.0.3, both of which are supported according to the
dependencies wiki
page.
Just for completeness, here's the error from revision 8643:
build/ARM_FS/cpu/base_dyn_inst_impl.hh:149: void
BaseDynInst::initVars() [with Impl = O3CPUImpl]: Assertion
`cpu->instcount
I have not tried running with gem5.debug, so I will be doing
that today. Maybe this is an assertion that is occurring due to an
optimization. That would mean it wouldn't be triggered in
gem5.debug since
it runs without optimizations. Have you tested all debug,
opt and fast
with your tests?
Thanks,
Andrew
On Tue, Mar 13, 2012 at 1:37 PM, Rio Xiangyu Dong <
riosher...@gmail.com> wrote:
Hi Andrew,
I didn?t see this error in my simulations. May I ask which gem5
version you are using? I find some of the latest code
updates do not comply
with my changes. I am still using the DRAMsim2 patch on
Gem5 repo8643, and
have run all the runnable benchmarks in SPEC2006, SPEC2000,
EEMBC2, and
PARSEC2 on ARM_SE.
Thank you!
Best,
Xiangyu
*From:* Andrew Cebulski [mailto:af...@drexel.edu]
*Sent:* Thursday, March 08, 2012 6:52 PM
*To:* gem5 users mailing list
*Cc:*riosher...@gmail.com; sa...@umich.edu
*Subject:* Re: [gem5-users] A Patch for DRAMsim2 Integration
Xiangyu,
I've been having an issue recently with the number of
instructions I've been seeing committed to the CPU (I have
a separate
thread on this). It turns out the issue seems to be coming
from this patch
you created to integrate DramSim2 with Gem5.
Unfortunately, I've been
running with gem5.fast, not gem5.opt. So up until now, I
haven't been
seeing assertions. I thought I'd run it with gem5.opt or
debug back in
December, but I must not have. My runs on the Arm O3 cpu
fails with this
assertion:
build/ARM/cpu/base_dyn_inst_impl.hh:149: void
BaseDynInst::initVars() [with Impl = O3CPUImpl]: Assertion
`cpu->instcount
-Andrew
Date: Sun, 18 Dec 2011 01:48:58 -0800
From: "Dong, Xiangyu" <riosher...@gmail.com>
To: "gem5 users mailing list" <gem5-users@gem5.org>
Subject: [gem5-users] A Patch for DRAMsim2 Integration
Message-ID: gmail.com>
Content-Type: text/plain; charset="us-ascii"
Hi all,
I have a Gem5+DRAMsim2 patch. I've tested it under both SE and FS
modes.
I'm willing to share it here.
For those who have such needs, please go to my website
www.cse.psu.edu/~xydong <http://www.cse.psu.edu/%7Exydong> to
download the patch and test it. To enable
DRAMSim2, use se_dramsim2.py script instead of se.py (for FS, you
can create
by yourself). The basic idea to enable the DRAMsim2 module is to
use the
derived DRAMMemory class instead of PhysicalMemory class.
Please let me know if there are bugs.
Thank you!
Best,
Xiangyu Dong
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://m5sim.org/cgi-bin/mailman/private/gem5-users/attachments/20111218/f3fdf5da/attachment.html
>
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing
listgem5-users@gem5.orghttp://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users