On 01/25/2013 10:00 PM, Amin Farmahini wrote:
I have developed a model that frequently switches between cpus. To be more
specific, I switch between O3 and a cpu model of mine. After new changes to
O3 draining (http://reviews.gem5.org/r/1568/), I have encountered two
assertion failures.

1.  assert(predHist[i].empty()); in BPredUnit<Impl>::drainSanityCheck() (src
/cpu/o3/bpred_unit_ipml.hh)
Prior to new patch, we squashed the history table before switching, but as
far as I understand, we don't do so any more in the new patch. This
assertion failure happens, for example, when you switch from atomic to o3
and then from o3 to atomic.

This is a bug in the draining code. Just comment out the code in drainSanityCheck and you should be fine. I'm a bit surprised that we haven't seen this in the regressions, it seems to be that this assertion would trigger on every single O3 CPU drain/resume.

2. assert(!cpu->switchedOut()); in DefaultFetch<Impl>::
processCacheCompletion (src/cpu/o3/fetch_impl.hh)
Obviously this happens when fetch stage in O3 receives a packet from cache
(possibly after an Icache miss) while the o3 is switched out. Again,
previously, we used to detect such a situation and activate the fetch only
if no drain is pending.

I don't think this should by possible any more, it's most likely a bug somewhere else if the assertion triggers. BaseCPU::takeOverFrom disconnects both the icache and dcache when switching between CPUs, so the CPU should never be switched out and connected to a cache at the same time. Besides, the new O3 draining should wait for /all/ outstanding requests to complete or be squashed. As far as I'm concerned, the the draining code is buggy if there are still pending ifetches in a drained system.


I have found a solution to work around these assertion failures, and I am
not sure if this only happens to me because of the specific way I use the
O3 draining or not. I just wanted to mention these assertion failures could
be possible bugs.


The first assertion is almost definitely a bug. I suspect the second one could be due to a bug in your configuration scripts or in your CPU model. Are you using any of the example scripts? Or have you rolled your own? If so, could you send us/me a copy so I can have a look?

//Andreas

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to