Re: [gem5-dev] assertion failures after O3 draining patch (Changeset 6c9e3d624922)

Andreas Sandberg Thu, 07 Feb 2013 08:25:05 -0800

Hi Tony,

There was a small mistake, well actually, a pretty large one, in thepatch I sent you. The patch actually breaks draining completely... :(


I've attached the new version of the patch. Sorry for the confusion.

I tried to reproduce the bug using the current tip/master with the patchapplied and the simulation gets stuck around tick 6901819000 instead (Iused the same command line as you did). It seems like it's something todo with L2 draining, but I haven't figured out the details yet.


//Andreas

On 02/05/2013 03:50 PM, Anthony Gutierrez wrote:

Hi Andreas,

The changeset I was using when I ran into this problem was this one:
http://repo.gem5.org/gem5/rev/f9e76b1eb79a.

I tried with the patch; it no longer asserts, but now the simulation seems
to hang. The kernel and disk image I am using are from:
http://gem5.org/bbench-gem5. The gingerbread image with bbench and the
kernel are there.

With the latest repo (unmodified) repeat switching also causes the
simulation to hang and never hits that assert.

Thanks,
Tony

On Mon, Feb 4, 2013 at 3:02 PM, Andreas Sandberg <[email protected]>wrote:

Hi Tony,

I had a quick look and was unable to reproduce it myself. Could you
check if it is still a problem and send me your kernel binary in that
case?

I suspect that the problem is that there are cases when we don't reset
the memReq[tid] pointer when a request is has been squashed. Could you
test to see if the patch I've attached solves the issue? The fix is also
included in my gem5 fixes branch [1].

//Andreas

[1] https://github.com/andysan/gem5

On Mon, 2013-01-28 at 11:08 -0500, Anthony Gutierrez wrote:

Hey Andreas,

Do you have any idea about this problem:

http://www.mail-archive.com/[email protected]/msg06550.html

Thanks,
Tony

On Mon, Jan 28, 2013 at 4:31 AM, Andreas Sandberg <

[email protected]>wrote:

On 01/25/2013 10:00 PM, Amin Farmahini wrote:

I have developed a model that frequently switches between cpus. To be

more

specific, I switch between O3 and a cpu model of mine. After new

changes

to
O3 draining (http://reviews.gem5.org/r/**1568/<

http://reviews.gem5.org/r/1568/>),

I have encountered two
assertion failures.

1.  assert(predHist[i].empty()); in

BPredUnit<Impl>::**drainSanityCheck()

(src
/cpu/o3/bpred_unit_ipml.hh)
Prior to new patch, we squashed the history table before switching,

but as

far as I understand, we don't do so any more in the new patch. This
assertion failure happens, for example, when you switch from atomic

to o3

and then from o3 to atomic.


This is a bug in the draining code. Just comment out the code in
drainSanityCheck and you should be fine. I'm a bit surprised that we
haven't seen this in the regressions, it seems to be that this

assertion

would trigger on every single O3 CPU drain/resume.


  2. assert(!cpu->switchedOut()); in DefaultFetch<Impl>::

processCacheCompletion (src/cpu/o3/fetch_impl.hh)
Obviously this happens when fetch stage in O3 receives a packet from

cache

(possibly after an Icache miss) while the o3 is switched out. Again,
previously, we used to detect such a situation and activate the fetch

only

if no drain is pending.


I don't think this should by possible any more, it's most likely a bug
somewhere else if the assertion triggers. BaseCPU::takeOverFrom

disconnects

both the icache and dcache when switching between CPUs, so the CPU

should

never be switched out and connected to a cache at the same time.

Besides,

the new O3 draining should wait for /all/ outstanding requests to

complete

or be squashed. As far as I'm concerned, the the draining code is

buggy if

there are still pending ifetches in a drained system.



  I have found a solution to work around these assertion failures, and

I am

not sure if this only happens to me because of the specific way I use

the

O3 draining or not. I just wanted to mention these assertion failures
could
be possible bugs.

The first assertion is almost definitely a bug. I suspect the second

one

could be due to a bug in your configuration scripts or in your CPU

model.

Are you using any of the example scripts? Or have you rolled your own?

If

so, could you send us/me a copy so I can have a look?

//Andreas


______________________________**_________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/**listinfo/gem5-dev<

http://m5sim.org/mailman/listinfo/gem5-dev>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev



_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] assertion failures after O3 draining patch (Changeset 6c9e3d624922)

Reply via email to