On 06/07/2010 17:43, Ali Saidi wrote:
On 7/6/10 1:22 PM, "Timothy M Jones"<tjon...@inf.ed.ac.uk> wrote:
Hi everyone,
For a while now I've been trying to implement SMARTS-like simulation
within M5. I'm almost there now but am stuck on one particular part.
For SMARTS simulation we repeatedly switch CPUs between Atomic (for
fast-forwarding and functional warming), then O3 (for measurements) and
back again. When switching, I change the memory mode to suit the CPU
I'm next using.
One way would be to just use the atomic cpu to create a set of checkpoints
you then ran from with the O3 cpu, but that is a stop gap solution.
That is an option, but for SMARTS you typically need 8000 samples per
program. I'm looking at evaluating a compiler scheme here, so would
need that many checkpoints per benchmark per binary version. It's not
really an option, unfortunately :-(
This works fine except for a corner case. If O3 gets told to drain
whilst waiting for an instruction cache access, it may finish draining
and be switched out before the request is returned. Then Atomic is
switched in and gets a timing callback, so fails.
I've tried various schemes to address this (apart from simply ignoring
the timing callback in Atomic, since it seems like this shouldn't be the
place to sort this out - after all, it's not Atomic's fault it's getting
a timing callback but a problem with something that happened
beforehand). However, I can't find a solution to fix this easily.
Could someone help me out and point me in the best direction to go?
I've tried preventing O3 from draining if fetch is waiting on an
instruction cache access, but that's not always obvious to spot since
the request can be squashed leaving no trace of it.
I've tried preventing the port from signaling that it's drained until
its event queue is empty. However, this has the knock-on effect that it
is sometimes not empty when switching from Atomic to O3. Since Atomic
doesn't have a drain function, it simulates forever.
Without looking at the code I'm not exactly sure, but that would be my
suggestion. Look at the port for the icache and if it has an outstanding
memory request then don't return 0 for drain(). I'm not sure what you mean
by the event queue in this case, but the atomic cpu shouldn't have a queue
of requests in the port, or that shouldn't be the case.
The event queue I'm talking about is the one belonging to the
SimpleTimingPort object. I take your point that Atomic shouldn't have
requests stacked up in the port. I'll have a dig a little further and
see if I can work out why that is happening.
Cheers
Tim
--
Timothy M. Jones
http://homepages.inf.ed.ac.uk/tjones1
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev